Defining the Bazel workspace

There are two main components required for the Bazel build environment to function. The build file, which contains the definitions of the build actions and the workspace file which contains any external dependencies and any rules needed for the build actions to be able to run. In this chapter we will take a deeper look into how to set up the workspace, understand how to add external dependencies and consider the pros and cons of each method. 

We will also look at some more advanced examples of how to make external dependencies available in a way that the rest of your project can reference them as build targets. We’ll look at examples of how to add external build rules which add everything we need to compile our code for different programming languages. Lastly we look at how these tools utilise the toolchain feature in Bazel to make it easy for developers to set up their rules to run on different platforms and build with different architectures

Constructing a workspace

In the last example we set up a very simple workspace and all it did was download a single file for us and make it available to our build action. We did this by loading the http_file function from the http_file workspace rules file that comes bundled with Bazel

load(@bazel_tools//tools/build_defs/repo:http_file.bzl, http_file)

With the http_file loaded we can now call it, and like we saw in the previous chapter it will execute based on the arguments that we give it and download a file into our workspace (we’ll look at more examples on the http rules later in this chapter).

// WORKSPACE FILE
http_file(
   name = "my-file",
   url = "https://exmaple.com/file.data",
)

Adding files to the workspace is fundamentally what happens behind the scenes when running workspace rules but that's usually one the first step. In the example from the previous chapter we fetched the cover file as a static file. The workspace can do much more. Instead of downloading a single file we can download whole toolchains which can be used to do pretty much anything. Let's look at an example on how we would set up our workspace to be able to compile a binary file from our python code (which will be a big part of future chapters.

// WORKSPACE FILE
// Note that this code is only as an example and a few steps were skipped

load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
http_archive(
    name = "rules_python",
    url = "https://github.com/bazelbuild/…/rules_python.tar.gz",
)

load("@rules_python//python:repositories.bzl", "python_register_toolchains")

python_register_toolchains(
    name = "python_3_11",
    python_version = "3.11",
)

Let's look at what this code accomplishes

  • We load the http_archive rule from the built in workspace rule.

  • We download the rules_python.tar.gz archive from the internet.

  • We load the python_register_toolchains function from the rules_python repository

  • We call the python_register_toolchains function and give it a name and the python version we want to use

The python_rules repository handles downloading the correct python version into the workspace, in our case python 3.11 and sets it up in a way that when we want to build our python project.

Our goal by adding this logic is to make it so that build actions are able to compile a binary from python source code. We see that we load the repository rule python_register_toolchains in the workspace file in order to choose which python interpreter we’ll be using. The python rules also give access to a build rule called py_binary. When the py_binary rule is run, it will do so based on how the workspace has been set up. This is very useful since if you ever need to upgrade the environment (workspace) where the python code is compiled you only have to do that in the workspace file.

// BUILD FILE
load("@rules_python//python:defs.bzl", "py_binary")

py_binary(
  name = "main",
  srcs = ["main.py"],
)

We are using the python rules as an example here since much of the later chapter of this book will expand on them but the process is exactly the same if you want to use the c++, android rust or whatever language you want. You can even mix and match so if your project has both python logic and c++ logic you just as easily configure your workspace to support both. This is one of the key benefits of Bazel

External dependencies

You’ve probably had this nagging feeling when we’ve talked about the external dependencies that there must be a fundamental flaw in keeping hermeticity when depending on data externally, especially from sources that you don’t control. We’ve already established that by placing the workspace file at the root of your file structure marks all files within that folder structure as part of the workspace and Bazel makes sure that if something changes it is known. So any file that is in the local area is fine. But what about files that we download directly from the internet.

In the last chapter we looked at fetching a file through the http_file repository rule. Bazel has a few other useful rules built in to fetch external dependencies. But before we look at how we use these we need to address the elephant in the room. We’ve talked about how important it is to keep everything hermetic so that we can be sure that our build actions run consistently and correctly across the environment. Downloading  a file through a URL means that Bazel can’t be sure if the file is the same. Let's look at the code we wrote in the last chapter were we downloaded the cover for our book from an external URL:

# WORKSPACE
load("@Bazel_tools//tools/build_defs/repo:http.bzl","http_file")

http_file(
	name = "book_cover",
	url = "https://example.com/book_cover_image.png",
)

The URL path in this case comes from example.com, a domain which we don’t control. There is a possibility that the owner of the domain decides to change the book_cover_image.png on their site or even remove it.

Lets see what can be done to help with this issue. If you followed the example in chapter one you might have noticed that we got a warning when we ran our build action that used the book_cover_image.png file

DEBUG: Rule 'book_cover' indicated that a canonical reproducible form can be obtained by modifying arguments sha256 = "57bf73fa60189919929…"

This warning tells us that we can provide an argument to the http_file rule that validates that the file we want to use is truly the same one between runs. The argument called sha256 allows us to provide a hash value that is a unique identifier generated using the sha256 algorithm based on the contents of the file. If you don’t provide this value, bazel will give you the warning above and you can paste the value into the sha256 argument of the http_file function

http_file(
	name = "book_cover",
	url = "https://example.com/book_cover_image.png",
	sha256 =” 57bf73fa60189919929…”,
)

Adding the sha256 argument ensures that the file used by the build actions is precisely the file that we wanted. But there is a catch: by default, Bazel caches external dependencies locally to speed up the builds. In the example in chapter 1 we noticed that the build time went from 7 seconds to a fraction of a second, most of the speed improvement came from the fact that we didn’t have to download the file again. This is great for rapid development because we don’t want to have to download the same file every time we run a build action. 

We can run into a scenario where the file in the cache is the correct version while the file on the external server has already changed. A developer that does not have a populated cache will download a new file and you’ll end up in a scenario where a build fails on one machine and succeeds on another. 

By using the external dependency cache in this way we sacrifice a bit of the correctness for speed. Let's take a better look at the external dependency cache and what we can do to get back to being fast and correct.

External Dependency Cache

The external dependency cache is where Bazel stores any external dependencies that you have defined in your workspace and it has been fetched from the external source. These can be anything from simple files to fully fledged tools. These dependencies can be anywhere from a few kilobytes in size up to multiple gigabytes.

When a build action, or any logic that depends on an external dependency Bazel will first check this cache to see if it has already fetched the data. If it finds a match it will add it to the workspace and use it. If Bazel does not find the dependency it will run the defined workspace rule and fetch the dependency from the external source. Fetching the external dependency every time is the only way to ensure that the external dependency is checked is to purge (or expunge) the cache before you run the action.

bazel clean --expunge

This command completely cleans out the whole bazel environment and any cache and will force it to do a clean build each time. By running the clean with the expunge flag we get back to being correct, but we sacrifice a lot of speed (and is not recommended).

Another solution is ensuring that external files are always fetched from locations that you control, or at least from a location where you know when they change. So if a new version of the book cover gets introduced, it's uploaded to a new URL path then the external dependency in the workspace file is updated to point. You can accomplish this by using the canonical_id argument 

The original external dependency on the book cover file looked like this, and as we noted it will fetch the image from the cache as long as the cache is available

http_file(
	name = "book_cover",
	url = "https://example.com/book_cover_image.png",
	sha256 =” 57bf73fa60189919929…”
	canonical_id = “book_cover_version_02”
)

If you know that the target file has changed, you can use the canonical_id parameter as part of the external dependency. Any time that you know the file has changed, you can update this value and it will invalidate the cache.

If we change the canonical_id value to book_cover_version_02 it will tell Bazel to check for this particular repository and make sure that it has this canonical id. If it does not have it, even though the link is the same, sha256 is the same etc it will download the target file again to the workspace. Once the file is downloaded again, Bazel will check the sha256 hash and correctly report if the file that game from the external source is what we expected. 

If the file that is fetched through the link has not changed, then everything will continue to work the same way (and we just wasted a tiny bit of extra time downloading the external dependency again) But if the file has changed, that is the sha256 of the downloaded file does not match what we expected, then Bazel will throw an error and let us know that it can not perform the requested action because an external dependency is not what we are expecting.

The problem with this is that you have to know that the external file has changed in order to update the canonical id and invalidate the cache so you can’t guarantee that your builds are correct.

The solution that results in the fastest and most correct outcome is to simply move all the dependencies into the workspace. If we continue using the book cover example, then here we would be just to commit the book cover image into source control, along with the source files and all of the tools. That way the sha256 check is enough because the file is part of the workspace and can be verified.

Moving tools and dependencies into the same workspace and having them source controlled together is referred to as a monorepo. There are both benefits and drawbacks structuring your project as a monorepo but from the perspective of ensuring the most correct and fast build actions this is the way to go. Bazel is an open source version of a tool developed by google whose structure where making fast and correct builds are incredibly important. This was the motivation for bazel.

At the end of the day, Bazel is a tool and you can use it however you want. designed to help you achieve fast and correct builds but does not force you to.

If your project does not require complete hermeticity then using external dependencies and cleaning out the local cache is fine. If your project does depend on external files and tools but your workflow supports versioning them then either providing a versioned URL to the external file or using the canonical Id property is the right thing for you.

The key to get the most out of bazel is understanding the impact of your choices. Bazel does not force hermiticity, just gives you the environment to achieve it if that's the right thing for you. But as we’ve seen, the more hermetic the build environment is and the more control that you have over it. The faster you can make your builds (through the use of the cache) and the more correct it can be (by ensuring that anything that goes into the build actions is what we expect it to be.

With that being said, let's dive into the different repository rules that come bundled with bazel and explore the benefits and drawbacks.

Next
Next

Anatomy of a Bazel Project