My journey of discovering the value of Bazel Build

Introduction to Bazel and its advantages

My background in software development and build automation is somewhat unconventional. My background is as a Technical Artist in game development. Throughout my career, I worked on complex MMO projects where my role was to make sure that the programmers were writing code that enabled the artists and designers on the team to reach their goals, and at the same time make sure that the art content created by the artists conformed with the requirements from the engineering team so that the product was as optimized as possible.

When I was starting out, I truly believed that this division between design and engineering was similar to two warring tribes locked in an endless state of conflict due to fundamentally different beliefs. I had countless conversations with the artists where they explained to me, defeated, that it looked like the programmers were deliberately trying to make the art assets look bad by reducing the quality in the game, disabling features, and limiting memory budgets. I’d then have the same conversations with the programmers, who just couldn’t understand why the artists would make an active effort to ruin the game’s performance, bypassing best practices, and making it impossible to create a shippable product. 

My role at the studio was closer to a diplomat than a developer.

However, as I gained more experience working in this environment, I started to understand that both factions were truly working towards the same goal: shipping the best product possible to the customer. So, the problem wasn’t a religious difference; they just didn't speak the same language. Now that I had figured this out, I got really excited and thought that the obvious solution here was to teach them each other's languages. The designers will be trained with basic programming skills, and the programmers will get design lessons. Surely, this will create a common ground where all this inefficiency and friction goes away. But this was not the case. The core problem here is that you want people to focus on their specialization. So, instead of creating this common language, I started to move all my developers over to becoming generalists (like myself).

The biggest thing I took out of the cross-training project was that when a developer makes a change that has a negative effect on the product, they feel bad. What makes it even worse is that if you make a change that has a negative impact in an area where you don’t fully understand everything. It's like being in a foreign country and breaking a social norm. You had no ill intent, and you know you did something wrong, but you can’t quite figure out what. And everyone around you speaks a foreign language, so you can’t communicate.

I learned that everyone is trying to do their best and wants things to run as smoothly as possible. Most of the time, when a negative change of this type goes into the product, it's usually because the developer didn’t realize the impact it will have on the product because they have no way of testing it.

So, armed with this knowledge, I figured out how computers are great at automatically checking for errors, validating changes, and making sure that nothing goes wrong. I will teach a computer both languages and have it act as the great translator. Designers do a task, and the great translator goes through the change and checks any values that might be relevant (size of an image, dimensions, bit depth, etc.). It will generate a report that highlights what could go wrong and send reports to the relevant team members. I started to lean heavily on continuous integration and automated testing. Any time that a developer made a change, our CI system would pick up the change, run a series of tests to try and make sure nothing was wrong, and when it gave thumbs up, the change would get merged into the mainline of the product.

Even though this was a great idea (and I highly recommend that all software projects do automatic testing, continuous integration, and continuous delivery), it still hasn't solved my issue. We now had a path towards having great test coverage for our product, but I didn't feel like I had solved the right problem. The core of the problem was that we started to build this from the point of view of continuous integration. So, when a change happens in the code base, we would fetch that change, figure out what type of test it should run, check if this test had any dependencies that must run first, run the prerequisite tasks, and then finally run the test. We ended up with a very complex system that needed dedicated developers to maintain and develop. I had basically created another software product within the company where the build engineers were yet another tribe who spoke yet another language but did have the same ultimate goal.

This is when I took a step back and started to think about what I really needed to solve this problem. So, I came up with a list of requirements for what I wanted:

I want an environment where all my developers can see the impact of their work on the final product.
— Gisli's realization

This was a pretty tall order, but I was determined to create this environment. That’s when I stumbled upon Bazel. I was fascinated by how a company like Google was not only able to maintain but also actively develop massive software projects like Gmail and Google Docs.

From my early first tests, I started to see that it is easy to use since all you need to do is have the Bazel software installed on your workstation, and it does everything else for you. I don’t have to worry if my developers have the wrong Python interpreter on their workstations or have the correct packages installed. Bazel maintains an isolated workspace where we can be sure that actions are run hermetically. Running build actions hermetically is a magical concept. A hermetic build action will run exactly the same if you run it twice, and it will run exactly the same on any workstation. This also means that if a build action has been run once, and the inputs or tools haven't changed, the result will be the same. This enables Bazel to have powerful caching features, so you only ever need to run build actions where something has changed. This means that it's both fast and correct.

For me, the biggest hurdle was getting started and maintaining it. There were quite a few foreign concepts that I had to learn when I started out, and it took me much longer to get started than I needed. This is why I’m writing this book: to make sure that you can have fast and correct builds and will have an easy time maintaining them.

Understanding Bazel

In short, Bazel is a build system designed to make builds. Bazel has many intricate and advanced features that we’ll look into in detail in later chapters of the book. However, all these features are there to support this one simple goal:

Bazel makes builds
— Gisli's realization

Bazel itself doesn’t compile any code, package any outputs, or deploy anything to a CDN. All it does is make builds. So, what does making a build mean? Building something is the process of converting one or more ingredients, using one or more tools, into one or more outputs. Building is a concept much broader than computer science. For example, a carpenter with a stack of lumber (the ingredients), wielding their saw and hammer (the tools), creates a house, and we say that the carpenter built a house. Similarly, a chef can take vegetables, spices, and water (the ingredients), use a knife to chop the vegetables, a measuring cup for the spices, and a pot to cook in (the tools), and they’ll build a delicious meal. Even though we usually say cook a meal, the process is, in fact, the same as that of a carpenter.

As a software developer, you might be building a GUI application, web service, library, video game, or any type. This process involves taking one or more files containing source assets, such as source code (the ingredients), and running them through a tool to get the intended output. In the software development world, building can involve thousands of sources of different types, created in languages. The process can include a large number of tools, each with a specific purpose, such as parsing, compiling, linking, interpreting, compressing, downloading, rendering, solving, or anything that has to do with converting a set of inputs into some output.

Bazel enables a developer to create an environment around all these details that go into building.

The Chef and Programmers “mise en place”

You might be asking yourself, “What do a chef and a programmer have in common, and does it make sense to make this analogy?” The two are more alike than we first think. Let’s say a chef is tasked with making a delicious soup. They have their recipe, know their way around the kitchen, and understand how to use all the tools. An inexperienced chef might approach the project in the following fashion:

The first step is to start with 300g of finely chopped onions. Where did I leave the knife? Do I have any fresh onions? How finely do I chop the onions?

The inexperienced chef hopefully will be able to accomplish the task, but it's highly inefficient. It's not just that it’ll take more time, but there are a large number of variables that they need to react to on the fly.

The inexperienced programmer behaves in the same way. Let's imagine the programmer is contributing to a cookbook app.

The first step in adding a feature to the app is getting it to run in the local environment. Start by checking if Python 3.11 is on your computer. If it is, make sure that the correct packages are installed. The app also needs a database to be installed and running at a specific address locally…

Both the inexperienced programmer and the inexperienced chef make a fundamental mistake, they don’t factor in the impact that having variation in the environment will have on the outcome of their work.

An experienced chef prepares everything ahead of time to make sure that when starting the cooking process, there are fewer things that can go wrong, and they can focus all their attention on the most important and time-sensitive part of the process. “Mise en place” is a concept taught to chefs in training, which roughly translates to “putting in place” or gathering. The idea is that before you start cooking, you should prepare everything you can. Less things can go wrong, and it's more likely that your meal will be a success.

The more experienced a programmer becomes, the more they start to think more about the “Mise en place” principles in their projects. The application can be set up to run in a virtual environment, dependencies can be included in source control instead of having to install them. The fewer barriers and variables in running the application, the more time they can spend on creating value.

Things start to get really interesting when we start to scale the project. A chef managing a busy restaurant will have multiple different dishes on the menu and staff with different responsibilities. The chef needs to be sure that the dishes coming out of the kitchen are perfect each time. This is almost impossible to do because there are so many variables at play, and there’s no way to have perfect control over everything. 

Scaling a software project is similar but can grow to a much larger scale. Teams can have thousands of members, each with different skill sets, responsibilities, and focus. They can even be in different time zones and not speak the same language. The number of ingredients in a complex software project can be in the hundreds of thousands, and the team can make hundreds of changes to these source files a day. The software project has some benefits over the restaurant.

Table 1.2: Where can the programmer scale, but the chef cannot

You might ask yourself, does it really matter? If there is a little bit too much salt in the soup or the dessert was baked for a little bit too long, will the customer care? Probably not if the food is consistently good. If the food is a bit off once, we usually forgive and move on.

However, in software development, these small variations can lead to bugs that can range from the smallest nuances to catastrophic failures. What if a developer finishes a feature with Python 3.11, but the application is deployed with Python 3.10? It's probably going to be fine. Maybe there is just a warning that shows up in the log when the application is deployed, or there is a memory leak that crashes the application. These small variations add up, especially when the teams and codebase scale.

Bazel is designed to remove these variables from the environment and make sure that when you run an action, it runs the same everywhere.

I can’t imagine scaling a restaurant to hundreds of menu items, thousands of staff using tens of thousands of ingredients, and consistently delivering perfect quality in each dish. This is possible in software development, and Bazel is a fantastic tool to enable it.

Basic Bazel Action

Now that we´ve gone through all these examples, we can conclude that Bazel only does one thing. Bazel takes a set of inputs, uses a tool to run an action on the input, and converts it into an output.

Figure 1.1: Basic Bazel action

Now that we have established that, let's take a closer look at it in detail. The first thing we'll consider is that a build action in Bazel is designed to be hermetic. Hermetic is not a term that I use every day, so it's good to look at its background. Basically, hermetic means to have something airtight, so that nothing gets in and gets out. In the context of Bazel, this means that when a build action is hermetic, there is nothing in the external environment that can contaminate the build action. A contamination in the build environment are any variations that were not accounted for. Let's say, you run a build action where the user is compiling C++ code and this code is designed to be compiled with the latest version of the Clang compiler.  When this code is compiled either on a build machine or by another developer, contamination would occur if the code is compiled with a different (the wrong!) compiler. While this contamination might not cause any issues most of the time, it could mutate into bugs that are difficult to reproduce. 

Running hermetic build actions means that you can be sure that given the same inputs, the result will always be the same.

Figure 1.2: Build action

Secondly, making hermetic builds means that both the inputs and the outputs are tracked. If your build action takes a single input file and outputs a single output file, then Bazel makes sure that these files are truly who they say they are. Bazel does this by creating a hash value from these files and keeping track of any changes in the inputs.

This feature is one of the things that makes Bazel so fast. Because you can trust that your environment hasn’t changed and your inputs haven’t changed, you can be sure that the output will also be the same. In such cases, instead of wasting time running an identical build action, Bazel can just fetch a cached version of the output and serve it.

Figure 1.3: Cached results

With this, we can do something very powerful. We can split our codebase into isolated modules that can be built independently and then chained together into something known as a dependency graph. In the following example, we’ll see that the first build action converts a single input file into a single output file. Then, there is a second Build action that uses the result of the first build action and uses that, along with one new build action to create the final output.

Figure 1.4: Action outputs as action inputs

Now we are starting to see the true power of how Bazel enables the codebase to truly scale. By splitting the code into small parts, each with a defined input and a defined output, we don’t have to worry about our codebase growing because anything that has already been built will be fetched from the cache. If you have set everything up correctly, this means that you’ll never have to run an action with identical inputs twice since it will fetch the results from the cache.

There is even a way to fetch the cached build outputs from a cache server, which means that if a build machine or another developer has built any of the files that you are working with, then you should only have to build changes that you make when working by yourself.

Figure 1.5: Scaling the build graph

Previous
Previous

Anatomy of a Bazel Project