No description
  • C 66.5%
  • Assembly 31.2%
  • Shell 2.3%
Find a file
2026-02-17 20:56:09 -05:00
asm.s ymm bad xmm good 2026-02-17 20:44:08 -05:00
asmtest.c better benchmarks 2026-02-16 01:08:29 -05:00
avxfull.c idk 2025-07-17 13:47:38 -04:00
avxleast.c idk 2025-07-17 13:47:38 -04:00
avxmid.c idk 2025-07-17 13:47:38 -04:00
benchmark.sh better benchmarks 2026-02-16 01:08:29 -05:00
BENCHMARK.txt update benchmark 2026-02-17 20:56:09 -05:00
chacha.h reorganize and add benchmark 2025-07-16 16:04:33 -04:00
normal.c Remove unnecessary header 2025-08-29 02:50:05 +00:00
README.txt update readme 2026-02-17 20:48:37 -05:00
test.c idk 2025-07-17 13:47:38 -04:00

This is my best attempt at making a decent chacha20 algorithm.

You need a machine with both `avx512vl` and `avx512f` flags. Other avx flags don't count.

To test this, run `lscpu | grep avx`

The full avx file is about 167.2% faster than no avx which is very signifigant.

The asm file, though, is only about 1.4% faster than the full avx.
- note the init function is about 50% faster but that isnt a majority of the work

So while I am happy to say that I can hand roll ASM better than gcc, it probably isn't worth it in most situations.

The actual benchmark I took on a 2 core Xeon Platinum 8175M AWS instance is in BENCHMARK.txt for anyone who is curious.