Ultimately impure

Nix implements reproducible and hermetic software builds using the metaphor of pure functions. Builds have a well-defined set of inputs, and for the same inputs, they produce the same output. Unfortunately, operating systems in general are not deterministic, partially as a result of their hardware not being deterministic (especially with SMP, branch predictors and OoOE).

Opening Pandora’s Box

Unfortunately, computer hardware cannot be fully deterministic as the real world is not considered deterministic according to our current understanding of physics. This is where things like quantum physics, chaos theory, predicting the future and whether or not free will can even exist come into play.

Let’s get back on topic.

Nix’s purity model

Nix deals with the following impurities:

  • Clientside network access
  • Link rot (source tarball URLs)
  • Insufficiency of build resources (out of memory, slow IO causing timeouts)
  • Minimal hardware differences (CPUID)

These impurities are usually considered inconsequential. Nix will throw an error and the operator shall fix the issue and retry the build. This is also why Nix doesn’t cache build failures. In a perfect world, a build failure could only be fixed by editing the derivation inputs. In our world, a build failure may also be caused by these impurities. Nix does not define how many browser tabs you have open at any givem time. As a result, builds may or may not run out of memory. Therefore, always retrying failed builds makes sense.

Link rot is a more widespread problem that the World Wide Web in general faces. archive.org is a well-known project that deals with this challenge. The Software Heritage project deals with a related challenge: Permanently archiving software source code. The NixOS organization has its own workarounds for this problem. Maybe IPFS will save us in the future?

In addition to the ones named above, Nix introduces its own impurities for practical purposes. These are intended to help abstract away hardware differences (build parallelism) across multiple machines, or allow the injection of secret data into the build process. Secret data should usually not be part of explicit derivation inputs, similar to how it shouldn’t be part of source control.

  • sandbox-paths: Paths to mount into the sandbox, usually just /bin/sh
  • $NIX_BUILD_CORES: Number of cores for make -j X
  • FODs and impureEnvVars: See below

Working around the impurity of Internet access

Nix builds usually do not have network access, and for good reason: Downloading things from the network is impure as fuck! There is one exception to that: The Fixed Output Derivation (FOD). FODs are effectively a convenience optimization. Instead of having Nix download all your source code by itself, you can instead delegate this task to a derivation. A FOD is designated by two things:

  • The ability to access the network
  • The requirement of producing the exact output every time

That second part is implemented by hashing the output and comparing it to the value specified in the derivation attribute outputHash. This makes it possible to implement arbitrarily complex artifact downloads (think cargo vendor or go mod vendor), as long as they download the same things every time given their inputs.

Dealing with build-time secrets

Secrets required at build-time are usually two-fold:

  • Credentials to access private software repositories
  • Keys to sign build artifacts

And nothing else matters

There are a number of tricks to solve these problems. A fancy one for private repo credentials is impureEnvVars. This derivation attribute allows you to specify which environment variables should be carried into the builder environment from the environment of the Nix process starting that build. This only applies to FODs, however. Nixpkgs usually adds $http_proxy and related to the list, enabling operators to configure network proxies for use with Nix builds. The limitation to FODs highlights that this is only meant to deal with differences in networks. Because the state of the network is generally unknown, this can be seen as a valid exception to the general purity rule. One could also employ iptables tricks and perform transparent proxying that way, and Nix would be none the wiser. Integrity and purity is still ultimately enforced by outputHash. The same goes for providing secrets for downloads. All you need to do is ensure that secrets are configured, otherwise your build will simply fail. Missing secrets effectively becomes equivalent to a network outage. With FODs, the Nix store can be considered a partially Content-addressed storage.

If you’re only fetching simple files, the netrc file may be a good alternative to impureEnvVars. Built-in fetchers can rely on this file for credentials. Keep in mind that built-in fetchers always run within the original Nix process. The advantage of this is that you don’t need to equip all the machines in your build farm with secrets. The client will handle authenticated downloads by itself and then send the files to the builders as needed.

My own pure world

Signing build artifacts is a more challenging task. We can’t rely on impureEnvVars because signing usually isn’t done inside a FOD. Instead, you can make use of sandbox-paths (or extra-sandbox-paths). This allows you to mount your secret key files into the sandbox. The builder can then access the key and use it to sign build artifacts. This gives you some control over what is considered pure and what isn’t.

If you’re operating a build farm, you would need to ensure that every builder has the same impurities configured, otherwise builds may randomly fail depending on which machine they landed on. If you use system-features, you can make your life a lot easier here.

Custom system-features

system-features is a seldomly-touched, but very interesting configuration option for Nix. Together with requiredSystemFeatures, it allows you to specify that certain builds should only be executed on certain machines. This is useful to ensure large compilations are only performed on machines with adequate hardware resources, but it can also be used to direct builds to machines where certain agreed-upon impurities have been configured. You could have a single machine with the system feature signer assigned to it, and a sandbox configuration that mounts a secret key to sign something. All signing-related operations could then be delegated to this machine exclusively by setting requiredSystemFeatures = [ "signer" ]; in derivations that handle signing. This also removes the need to equip all machines in your build farm with potentially highly sensitive data. A similar thing can be implemented for repo credentials, allowing only a select few “fetcher nodes” to handle downloading of private source code.