chacha20

No description

C 66.5%
Assembly 31.2%
Shell 2.3%

Find a file

strat 46bed319f2 update benchmark		2026-02-17 20:56:09 -05:00
asm.s	ymm bad xmm good	2026-02-17 20:44:08 -05:00
asmtest.c	better benchmarks	2026-02-16 01:08:29 -05:00
avxfull.c	idk	2025-07-17 13:47:38 -04:00
avxleast.c	idk	2025-07-17 13:47:38 -04:00
avxmid.c	idk	2025-07-17 13:47:38 -04:00
benchmark.sh	better benchmarks	2026-02-16 01:08:29 -05:00
BENCHMARK.txt	update benchmark	2026-02-17 20:56:09 -05:00
chacha.h	reorganize and add benchmark	2025-07-16 16:04:33 -04:00
normal.c	Remove unnecessary header	2025-08-29 02:50:05 +00:00
README.txt	update readme	2026-02-17 20:48:37 -05:00
test.c	idk	2025-07-17 13:47:38 -04:00

README.txt

This is my best attempt at making a decent chacha20 algorithm.

You need a machine with both `avx512vl` and `avx512f` flags. Other avx flags don't count.

To test this, run `lscpu | grep avx`

The full avx file is about 167.2% faster than no avx which is very signifigant.

The asm file, though, is only about 1.4% faster than the full avx.
- note the init function is about 50% faster but that isnt a majority of the work

So while I am happy to say that I can hand roll ASM better than gcc, it probably isn't worth it in most situations.

The actual benchmark I took on a 2 core Xeon Platinum 8175M AWS instance is in BENCHMARK.txt for anyone who is curious.