Testing Randomness is a Nightmare
Validating that a CSPRNG produces statistically random output should be a solved problem. The tests exist. The math is well understood. NIST published a whole document about it.
And yet, actually running these tests is an absolute mess.
The State of Randomness Testing Tools
The go to reference is NIST SP 800-22, the Statistical Test Suite. Published in 2010. Revision 1a. That's fifteen years of being the "standard" for testing random number generators.
The reference implementation doesn't compile on modern systems without modification. It expects input in a format that isn't clearly documented. The term "binary" in the tool doesn't mean raw bytes, it means ASCII characters 0 and 1.
So "binary" is actually text. Cool.
What about third party implementations? Surely someone's written something more usable in the last decade and a half.
There's stevenang/randomness_testsuite, a Python implementation of the NIST tests. Last meaningful commit: years ago. Open issues with no responses. It works, mostly, if you can figure out what format it actually wants, which is different from what the NIST tool wants.
There's Dieharder, which bundles the old Diehard tests with the NIST suite and some extras. Better maintained, but now you're dealing with -g 201 vs -g 202 for "file input raw" vs "file input ASCII" and again what counts as "raw" is not what you think.
There's TestU01, which is actually rigorous and well regarded. It wants you to link against it in C and provide a function pointer. If your RNG isn't in C, have fun with that.
The Input Format Problem
This is where it gets genuinely stupid.
Every tool expects a different input format. None of them clearly document what they want. You'll find yourself asking:
- Binary as in raw bytes?
- Binary as in ASCII "0" and "1" characters?
- Hex encoded?
- Big endian or little endian?
- With or without newlines?
The answer depends on which tool, which version, and sometimes which specific test within the suite. The NIST reference implementation's "binary" mode expects a text file of literal 0s and 1s. Your 1MB of random data becomes 8MB of ASCII.
If you get the format wrong, the tests might fail. Or they might silently produce garbage results. Or they might appear to work perfectly while testing something completely different from your actual output.
There's no standard. There's no clear documentation. There's just trial and error and reading source code to figure out what the tool actually does.
Everything is Abandoned
The pattern across all of these tools is the same: someone wrote it years ago, it mostly works, and nobody's maintaining it anymore.
The NIST suite is from 2010. Dieharder's website looks like 2007. TestU01's last update was 2009. The Python ports on GitHub have issues from 2020 with no responses.
Cryptographic randomness is fundamental to basically everything in security. Key generation, IVs, nonces, salts, if your RNG is broken, everything built on top of it is broken. And the tooling for testing it is a collection of abandoned projects with incompatible interfaces and unclear documentation.
That's not great.
What Actually Works
If you need to test a CSPRNG today, here's what I'd recommend:
For quick sanity checks: Dieharder with -g 201 (raw file input) will catch obvious problems. Run the full battery with -a. Some tests will show "WEAK" occasionally even on good RNGs that's statistical noise, not necessarily a failure. But many failures across multiple runs are the concern.
For serious validation: TestU01's BigCrush is the most thorough option. You'll need to write a C wrapper that provides your RNG output as a function. It takes hours to run. If you pass BigCrush, you're probably fine.
For the NIST tests specifically: stevenang/randomness_testsuite is usable if you read the source to understand the expected input. It's Python, so at least you can debug it.
For implementation correctness: Wycheproof test vectors. This doesn't test statistical randomness, it tests that your cryptographic implementations handle edge cases correctly. Different problem, equally important.
This Shouldn't Be This Hard
Someone could build a modern tool that:
- Accepts actual binary input (raw bytes, like a normal program)
- Runs the NIST tests, Diehard tests, and TestU01 batteries
- Has clear documentation
- Produces human readable output
- Is actively maintained
This tool does not exist. Maybe it should.
Until then, we're stuck duct taping together abandoned C code from 2010, Python scripts that expect mystery input formats, and statistical tests that sometimes fail randomly (which is, I suppose, appropriate).