Git is the version control tool of choice for most people. One area it unforunately lacks in is storage of very large files, especially binary files. Git-LFS, the official solution to this problem, is sub-par and according to many anecdotes, not worth the trouble.
DVC is a good candidate for an alternative. Originating from the machine learning community, it has some odd quirks (content-addressing files via insecure MD5 hashes, analytics spyware), but nothing a couple patches can’t fix. Patching DVC to use SHA256 instead of MD5 and JSON for its .dvc files instead of YAML creates a workable tool for dealing with large files.
DVC uploads files to a configurable remote, such as an S3 bucket and leaves a .dvc file in the Git repo, containing the SHA256 hash of the file. The file is named by its hash on the remote, so knowing the hash as well as a base URL is enough to download it. DVC asset files can be fetched with Nix by simply reading the .dvc file as JSON and reconstructing the URL from the hash.
{ fetchurl }:
{ cdnURL ? "https://cdn.privatevoid.net/assets", index }:
let
dvc = builtins.fromJSON (builtins.readFile index);
inherit (builtins.head dvc.outs) sha256 path;
hashPrefix = builtins.substring 0 2 sha256;
hashSuffix = builtins.substring 2 (-1) sha256;
in
fetchurl {
name = path;
url = "${cdnURL}/${hashPrefix}/${hashSuffix}";
inherit sha256;
}
This idea can be expanded to automatically replacing all the files endng with .dvc in a source tree with the assets as downloaded by fetchurl
. As such, a source directory with DVC-managed files in it can be passed into a Nix build using something as simple as src = hydrateAssetDirectory ./.;
.