Cryptography

CPU

Most of the work newer than described below is available in the latest supercop (20190910). One exception is a Chacha20 implementation using the current (0.8.0) draft specification of the V extension to the RISC-V architecture. This version uses builtins functions developed as part of the compiler work in the European Processor Initiative project. Source code can be found here. More informations on the intrinsics and compiler can be found here and here.

The archive crypto_stream_chacha20_dolbeau_riscv-v.tgz was last modified on 2020-04-19 (change the way VL scalability is used).

Somewhat old stuff

I've written implementations of some stream & aead ciphers using C + intrinsics. I find intrinsics more readable than raw assembly, and much easier to write. Also, modern compilers do a lot of things very well, I like to take advantage of them. The implementations are mostly targeted at the Intel C compiler, ICC, but should also work with GCC. Stream algorithms include salsa20 and chacha20 (both using SSE/AVX/AVX2/AVX-512) and AES-256 in counter mode (using AES-NI). You can find the archive here. AEAD algorithms are AES-256 in GCM mode (using AES-NI & PCLMULQDQ) and HS1-SIV in -hi parameters (using AVX2, to a specification slightly newer than the reference code in supercop-20140622; updated code was made available in supercop-20140907). You can find the archive here. There now is also a patch against the official supercop release.

Note that the interfaces and directory hierarchy are designed to fit in the supercop benchmark. A version of these codes is included in the 20140905 version of supercop, later updated in the 20140910 version. The published results of the benchmark (as of 2014/09/10) do not include compilation with ICC. Results are available for all algorithms.

Beware: those implementations are purely designed for speed on recent Intel architectures (mostly Haswell and newer), and ARMv8 (64 bits) with the crypto extension. They were not verified to be resistant to side channel attacks. It's probably safer to stick to reputable libraries for your cryptographic needs. Pr. Dan Boneh makes a very compelling argument during his excellent course over at Coursera.

Differences from the version in supercop-20141124 include updating HS1-SIV to v2 of the specifications (unfortunately, the name was not changed ans is still v1). As of supercop-20160717, my HS1-SIV implementation is properly labelled v2.
Newer version of supercop have some additional implementations, mostly the same algorithms with different key size.

The archive crypto_stream-intrinsics.tgz was last modified on 2016-05-04.

The archive crypto_aead-intrinsics.tgz was last modified on 2016-05-04.

The archive crypto_core-intrinsics.tgz was last modified on 2016-05-04.

The patch supercop_20141124_patch_20160504.patch was last modified on 2016-05-04.

The 20160504 version also contains some crypto_core algorithms, and aes256gcmv1 for ARMv8+crypto.

To test all the algorithms, you will need a CPU supporting AVX2, AES and PCLMULfor x86_64, and the crypto extension for ARMv8. Some algorithms will run on less than that, but this has not been extensively tested. Tested compilers (beyond what is tested in the official supercop results) include (and some others):

ICC version 15.0.3: icc -m64 -march=native -mtune=native -O3 -fomit-frame-pointer
GCC version 4.7.2: gcc -m64 -march=native -mtune=native -O3 -fomit-frame-pointer
GCC version 4.9.2: gcc-4.9.2 -m64 -march=native -mtune=native -O3 -fomit-frame-pointer [results seem better than with 4.7.2]
GCC version 5.1.0: gcc-5.1.0 -m64 -march=native -mtune=native -O3 -fomit-frame-pointer
LLVM/CLANG version 3.4.1: clang -march=x86-64 -mcpu=core-avx2 -mavx2 -maes -mpclmul -O3 -fomit-frame-pointer

Version 20160504 is included in supercop-20160715. Additionals variants are available in newer supercop.
Full list of implemented variants available in supercop-20161026:

crypto_aead/aeadaes128ocbtaglen128v1/dolbeau/aesenc-int (requires AES)
crypto_aead/aeadaes128ocbtaglen128v1/dolbeau/armv8crypto (requires crypto extension)
crypto_aead/aeadaes256ocbtaglen128v1/dolbeau/aesenc-int (requires AES)
crypto_aead/aeadaes256ocbtaglen128v1/dolbeau/armv8crypto (requires crypto extension)
crypto_aead/aes128gcmv1/dolbeau/aesenc-int (requires AES & PCLMULQDQ)
crypto_aead/aes128gcmv1/dolbeau/armv8crypto (requires crypto extension)
crypto_aead/aes256gcmv1/dolbeau/aesenc-int (requires AES & PCLMULQDQ)
crypto_aead/aes256gcmv1/dolbeau/armv8crypto (requires crypto extension)
crypto_aead/hs1sivhiv2/dolbeau/amd64-avx2 (requires AVX2)
crypto_aead/hs1sivhiv2/dolbeau/amd64-avx512 (requires AVX512F)
crypto_aead/hs1sivhiv2/dolbeau/amd64-sse
crypto_aead/hs1sivhiv2/dolbeau/armv8crypto (doesn't require crypto extension)
crypto_aead/hs1sivlov2/dolbeau/amd64-avx2 (requires AVX2)
crypto_aead/hs1sivlov2/dolbeau/amd64-avx512 (requires AVX512F)
crypto_aead/hs1sivlov2/dolbeau/amd64-sse
crypto_aead/hs1sivv2/dolbeau/amd64-avx2 (requires AVX2)
crypto_aead/hs1sivv2/dolbeau/amd64-avx512
crypto_aead/hs1sivv2/dolbeau/amd64-sse (requires AVX512F)
crypto_core/aes128decrypt/dolbeau/aesenc-int (requires AES)
crypto_core/aes128decrypt/dolbeau/armv8crypto (requires crypto extension)
crypto_core/aes128decrypt/dolbeau/std-1rt-nodk
crypto_core/aes128decrypt/dolbeau/std-2rt-nodk
crypto_core/aes128decrypt/dolbeau/std-4rt-nodk
crypto_core/aes128encrypt/dolbeau/aesenc-int (requires AES)
crypto_core/aes128encrypt/dolbeau/armv8crypto (requires crypto extension)
crypto_core/aes128encrypt/dolbeau/std-1ft
crypto_core/aes128encrypt/dolbeau/std-2ft
crypto_core/aes128encrypt/dolbeau/std-4ft
crypto_core/aes256decrypt/dolbeau/aesenc-int (requires AES)
crypto_core/aes256decrypt/dolbeau/armv8crypto (requires crypto extension)
crypto_core/aes256decrypt/dolbeau/std-1rt-nodk
crypto_core/aes256decrypt/dolbeau/std-2rt-nodk
crypto_core/aes256decrypt/dolbeau/std-4rt-nodk
crypto_core/aes256encrypt/dolbeau/aesenc-int (requires AES)
crypto_core/aes256encrypt/dolbeau/armv8crypto (requires crypto extension)
crypto_core/aes256encrypt/dolbeau/std-1ft
crypto_core/aes256encrypt/dolbeau/std-2ft
crypto_core/aes256encrypt/dolbeau/std-4ft
crypto_hashblocks/sha256/dolbeau/amd64-sha (requires SHA)
crypto_hashblocks/sha256/dolbeau/armv8crypto (requires crypto extension)
crypto_hashblocks/sha512/dolbeau/intelavx2rorxasm (uses Intel ASM, requires AVX2)
crypto_hashblocks/sha512/dolbeau/intelavxasm (uses Intel ASM, requires AVX)
crypto_hashblocks/sha512/dolbeau/intelsse4asm (uses Intel ASM, requires SSE4)
crypto_stream/aes256ctr/dolbeau/aesenc-int (requires AES)
crypto_stream/chacha12/dolbeau/amd64-avx2 (requires AVX2)
crypto_stream/chacha12/dolbeau/arm-neon
crypto_stream/chacha12/dolbeau/mipsel-msa
crypto_stream/chacha12/dolbeau/ppc-altivec
crypto_stream/chacha20/dolbeau/amd64-avx2 (requires AVX2)
crypto_stream/chacha20/dolbeau/arm-neon
crypto_stream/chacha20/dolbeau/mipsel-msa
crypto_stream/chacha20/dolbeau/ppc-altivec
crypto_stream/chacha8/dolbeau/amd64-avx2 (requires AVX2)
crypto_stream/chacha8/dolbeau/arm-neon
crypto_stream/chacha8/dolbeau/mipsel-msa
crypto_stream/chacha8/dolbeau/ppc-altivec
crypto_stream/salsa20/dolbeau/amd64-xmm6int
crypto_stream/salsa2012/dolbeau/amd64-xmm6int
crypto_stream/salsa208/dolbeau/amd64-xmm6int

GPU

AES

I've written an hybrid AES-256-GCM implementation in CUDA and NEON for the Jetson TK1 platform (based on the Tegra K1 SoC). The implementation includes a large family of AES kernels in CUDA.

There is also support for GCM using PCLMULQDQ on x86-64 CPUs (now with faster unrolled-by-8 version).

My first results are described in a paper titled An hybrid AES-256-GCM implementation for NEON CPU & CUDA GPU. Full code for all the evaluated implementations and the tests are available here.

The paper aes_gcm_gpu.pdf was last modified on 2014-11-05.

The archive aes_gcm_gpu-20141214.tgz was last modified on 2014-12-14.

Chacha20

Also, a fairly straightforward implementation of Chacha20 in CUDA.

The archive chacha_gpu-20141129.tgz was last modified on 2014-11-29.

Parallella

This is an experiment in coding stream algorithms (chacha20 and AES-256 in CTR mode) on the Adapteva Parallella. The blocks are generated on the Epiphany chip and brought back to the Cortex A9 for XORing with the message.

The archive epi_crypto-20160827.tgz was last modified on 2016-08-27.

home

Romain Dolbeau

Last modified: Tue Oct 15 14:24:45 CEST 2019