Developing in Containers using Visual Studio Code (T-SQL Tuesday #140)
This month’s T-SQL Tuesday is hosted by Anthony Nocentino (@nocentino). He wants to know what we’ve been up to with containers. Perfect timing, because I have just spent the last couple of weeks learning how to develop in containers using Visual Studio Code! I was planning to write this for myself anyway, but perhaps it can be interesting for others as well 🤓
What is the use case?
One of my clients are using dbt (Data Build Tool) for their data transformations. In short, this means that developers write data transformations in SQL as SELECT statements. All SQL code can be combined with Jinja templates. Inside of these Jinja templates, developers can reference other tables, use control logic, or define common SQL code snippets as reusable macros. Dbt then compiles the SQL+Jinja code into pure SQL.
For example, if a macro looks like this:
{% macro convert_date_to_int(column_name ) %}
CAST(CONVERT(CHAR(8), {{ column_name }}, 112) AS INT)
{% endmacro %}
And a developer writes something like this:
SELECT
order_id AS OrderID,
{{ convert_date_to_int('order_date') }} AS OrderDate
FROM {{ ref('stg_orders') }}
Dbt will compile everything into this:
SELECT
order_id AS OrderID,
CAST(CONVERT(CHAR(8), order_date, 112) AS INT) AS OrderDate
FROM stg.orders
After that, dbt uses these SELECT statements and turns them into actual tables and views in the data warehouse (or data lakehouse). Dbt can also run code tests, generate documentation, and produce lineage graphs showing dependencies between tables.
Don’t ask me how it does all of those things, though, because the whole point of using dbt is that I don’t have to understand it. I just write SQL code with a sprinkle of Jinja 😂 I do know how to ask dbt to do these things, though. For example:
$ dbt compile
But wait! Where the heck do I type that command?
Why use containers?
There are two ways to develop in dbt. You can either develop online using the web interface, or develop locally using the dbt Command Line Interface (CLI) to write commands like the example above. My client went with the option to develop locally.
That means that all developers, both full-time employees and external consultants like myself, with very different backgrounds and skills ranging from “never used a terminal” to “can write bash and git commands in my sleep”, need to install the CLI with all required dependencies on different laptops running different operating systems with different corporate policies while ensuring that everything is configured the same way and updated at the same time.
(*takes a deep breath* 😅)
My client solved this by using a container-based development environment. One team is responsible for the entire data platform, the overall architecture, the dev/test/prod environments, continuous integration and continuous deployment (CI/CD), and the development processes. They created and maintain the development container, which contains (hehe) everything developers need to develop in dbt.
The development container is distributed to developers through a git repository which also contains the dbt code. That means that when a developer gets the latest version of the code, they also get the latest version of the development environment.
I see some huge benefits to this. By using containers, we know that the development environment is consistent and always updated. New developers can be up and running in a fairly short amount of time. And there are only a few things to install locally.
What needs to be installed locally?
- Windows Subsystem for Linux (WSL 2)
- Docker Desktop (configured to use WSL 2)
- Git
- Visual Studio Code
- The Remote - Containers extension for Visual Studio Code
That’s it!
How to develop in containers using Visual Studio Code?
I can’t show you my client’s project, but I can show you how to get started on your own 😃
Start with a sample project
Once you have installed Visual Studio Code with the Remote - Containers extension, you will see a new Remote Explorer button on the menu, and a new green button on the status bar for working with remote windows:
If you open the Remote Explorer, you can choose to try a sample:
This opens the Command Palette where you can choose from a list of sample repositories to clone in a container volume, for example .NET Core or Python:
I don’t use .NET or Python, however, so I’m going to try something slightly different.
Start with a container configuration file
If you instead create and open a folder, the status bar will turn blue. It will also give you a new option to reopen the current folder in a container:
This time, it opens the Command Palette for adding development container configuration files. Since these are not full sample repositories, you have a lot more options, like the Azure CLI. (I’m going to choose the Azure CLI for these examples because it’s the most similar to how I work with the dbt CLI.)
After clicking on Azure CLI, a devcontainer.json file is added to the project. Since your folder now contains a Dev Container configuration file, you are prompted to reopen in container:
When you click reopen in container, Visual Studio Code starts building the dev container. You can click on show log to follow the progress:
The log in the terminal shows all the steps that happen behind the scenes. All necessary components are downloaded for you!
Once the dev container is ready, you will see that your folder now contains a .devcontainer subfolder with all the configurations. (In my client’s project, we also have all the dbt code in here. Developers work on the dbt code, and the other team manages everything inside .devcontainer.)
To open a new terminal, you can use the Terminal menu or the command palette, or click the + button:
Woohoo! There’s that little dollar sign that I asked about under my $ dbt compile code example further up in this blog post 😄 If this was a dbt project, I could type that command here to compile all my SQL+Jinja code into pure SQL. But since this is the Azure CLI and not the dbt CLI…
…we can use commands like az login instead, which will open a new browser window…
…or commands like az datafactory list…
…which will show all of our Azure Data Factories! 🤓
Once I’m done working for the day, I can just close Visual Studio Code.
Summary
Developing in containers using Visual Studio Code feels like a gamechanger compared to having to fill up my laptop with all sorts of applications and utilities! I had never worked this way before, I had only tried running SQL Server in Docker, so it took me a little while to learn and get used to. But now I don’t really want to go back to the “old way” 😊
Thank you to Anthony Nocentino (@nocentino) for coming up with this T-SQL Tuesday topic so I had an extra reason to share some of my experiences!
About the Author
Cathrine Wilhelmsen is a Microsoft Data Platform MVP, international speaker, author, blogger, organizer, and chronic volunteer. She loves data and coding, as well as teaching and sharing knowledge - oh, and sci-fi, gaming, coffee and chocolate 🤓