Sam's blog
February 7, 2022

Brute forcing NixOS store paths for no particular reason

Posted on February 7, 2022  •  6 minutes  • 1072 words

The Nix store

You might have noticed that when you build a package using Nix, it gets stored in /nix/store in a directory named something like xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-hello-2.10.

While building a package I noticed that the hash started with “123”, so that got me wondering, could I easily get it to say anything I wanted at the beginning?

The letters and numbers before the package name are a SHA1 hash that’s been base32 encoded . Note that since Nix uses a custom base32 encoder some words will be impossible to spell.

But where does the hash come from? There’s an excellent pill about this: Nix Store Paths . We’re interested in the Output paths section, so since the output path only depends on inputs, we can manipulate some attribute of the derivation to affect the store hash.

The idea

We can get the output path of a package using the .outPath attribute:

% nix-instantiate --eval -E '(import <nixpkgs> {}).hello.outPath'
"/nix/store/26x36dwihjw0d9kkzlk9qhl9ha2mx3jp-hello-2.10"

So let’s use overrideAttrs to insert a dummy attribute and see if the hash changes:

nix-instantiate --eval -E '((import <nixpkgs> {}).hello.overrideAttrs (old: {test = 1;} )).outPath'
"/nix/store/syn0aajw44lzg7l6kwn58ksvhjv9ll81-hello-2.10"

Nice, so even attributes that don’t actually affect the package at all will change the hash. That’s sensible since at this stage Nix doesn’t know what will affect the build in the first place.

So all we have to do is add a attribute with a different value each time and see if the resulting hash matches our criteria.

Another idea is to exploit a flaw in the now deprecated SHA1 to speed this up. However all I could find was how to do a Length extension attack or a Collision attack - which lets us find messages to get a certain hash - we want the other way around, a Preimage attack . So we’re stuck with brute-forcing.

Probability

Assuming that SHA1 is evenly distributed, we can calculate how rare a hash with a certain number of fixed parts is. Since the paths are using base32, each character contains 5 bits of information (25 = 32). For example, let’s say I want a hash that starts with “sam”, that means there’s a 1 in 25 * 3 = 32768 chance that our hash satisfies the condition. Of course more characters will exponentially increase the computation time.

Putting it all together

That Nix expression we previously ran was pretty long, so let’s start by working in a file:

with import <nixpkgs> {};
with lib;
with builtins;
let
  p = pkgs.hello;
in (
  (p.overrideAttrs (oldAttrs: rec { test = 1; })).outPath
)

We can run it like so:

% nix-instantiate --eval test.nix
"/nix/store/syn0aajw44lzg7l6kwn58ksvhjv9ll81-hello-2.10"

Alright, so we’ll need to generate a list of values to change the hash, and then later parallelize it by evaluating our file multiple times.
So how do we get a list of values to substitute into our test attribute? builtins.genList has it covered:

% nix-instantiate --eval -E 'builtins.genList (x: x) 10'
[ <CODE> <CODE> <CODE> <CODE> <CODE> <CODE> <CODE> <CODE> <CODE> <CODE> ]

Uhh… what? <CODE>? Those are supposed to be integers.

Nix is well maintained

…and other jokes you can tell yourself. Apparently nix-instantiate has had this issue for a while now...

Okay, so let’s use nix eval instead:

% nix eval '(builtins.genList (x: x) 10)'
[ 0 1 2 3 4 5 6 7 8 9 ]

Great, now let’s see how to pass arguments to the expression so we can later control where the generated list starts:

% nix eval '({x}: x)' --arg x 1
<LAMBDA>

So it’s not calling the function? Oh wait, this feature has been broken for longer than the issue we ran into with nix-instantiate .

Looks like the workaround is to either generate the whole expression with the value inserted, or to use environmental variables:

% a=hello nix eval '(builtins.getEnv "a")'
"hello"

Putting it all together

with import <nixpkgs> { };
with lib;
with builtins;
let
  p = pkgs.hello;
  nIters = 10000;
  start = (fromJSON (getEnv "i")) * nIters;
in
toJSON (
  map
    (
      val: [
        val
        ((p.overrideAttrs (oldAttrs: rec { test = val; })).outPath)
      ]
    )
    (genList (x: x + start) nIters)
)

(Yes, I’m using fromJSON to convert from a string to an int, sue me) Run the above with

i=0 nix eval -f test.nix ''

The output is (nIters set to 3 to keep it brief):

"[[0,\"/nix/store/l98gmxfxk0l6fp6qwysfdzzlhw7h820g-hello-2.10\"],[1,\"/nix/store/syn0aajw44lzg7l6kwn58ksvhjv9ll81-hello-2.10\"],[2,\"/nix/store/d36kg6qjgbrdc1c3m3gk0vzciqv5vis0-hello-2.10\"]]"

Now all we need is a quick python runner:

import concurrent.futures
import time
import json
import logging
import os
import subprocess
from concurrent.futures import FIRST_COMPLETED, ThreadPoolExecutor


def check_block(i):
  env = os.environ.copy()
  env['i'] = f'{i}'
  try:
    p = subprocess.run(
        ['nix', 'eval', '--raw', '-f', 'test.nix', ''],
        stderr=subprocess.PIPE,
        stdin=subprocess.DEVNULL,
        stdout=subprocess.PIPE,
        check=True,
        env=env
    )
  except subprocess.CalledProcessError:
    print(p.stderr)
    raise
  for v, path in json.loads(p.stdout.decode('utf-8')):
    if path.startswith('/nix/store/sams'):
      print(v, path)
      return True
  return False


workers = 24

block = 0
jobs = set()
running = True

with ThreadPoolExecutor(max_workers=workers) as executor:
  while running:
    if len(jobs) < workers:
      jobs.add(executor.submit(lambda i=block: check_block(i)))
      block += 1
      continue

    done, jobs = concurrent.futures.wait(jobs, timeout=1, return_when=FIRST_COMPLETED)
    for fut in done:
      try:
        if fut.result():
          running = False
      except:
        logging.exception('task exception')

And we’re done… right?

Not so fast

Upon running the above script, CPU usage shoots up to 100% but then drops back down to 10%
After some investigation, the nix-daemon process (specifically the workers that it’s spawned) are writing to disk a lot. Turns out that by evaluating a package to get the output path we’re actually creating thousands of derivations in the Nix store. How many thousands?

% nix-collect-garbage -d
deleting '/nix/store/irw78bb6qddvclsgkirb87a3xcvfgfxg-hello-2.10.drv'
deleting '/nix/store/3jrvb4715lj7b0fqn1jdffvj3fnfq9x6-hello-2.10.drv'
deleting '/nix/store/76ps2ar5kp9k1rqnp2dbhsciylq1y37z-hello-2.10.drv'
... # (took about 20 minutes to complete)
718778 store paths deleted, 959.60 MiB freed

That many thousands.

The only winning move is not to play

Derivations are essentially instructions on how to build a package, and creating thousands of them really isn’t an issue (aside from eating away at finite SSD writes). Upon further investigation the slowdown seems to come from the Nix DB, which now has a very inflated size compared to another host:

[kurisu ~]% du -sh /nix/var/nix/db/db.sqlite
17M  /nix/var/nix/db/db.sqlite

[karen-chan ~]% du -sh /nix/var/nix/db/db.sqlite
495M  /nix/var/nix/db/db.sqlite

According to Nix Pills : “It is a sqlite database that keeps track of the dependencies between derivations.”.

Looks like the “correct” way to this would be to look at the nix source code and create my own version of the hashing code, but that’s way out of scope for a weekend project so I’m stopping here. Maybe I’ll revisit this someday.

Goodbye.

Links