Emscripten, Clojurescript, and Cryptography

4 minute read Becker Polverini on

An old adage in cryptography is that one should never “roll his or her own crypto.” Besides being really hard to get right, it is stress-inducing, time-consuming, and tedious. It quickly becomes a black hole of code review and worry.

This article is a look at how we were able to take good C crypto, and call it from our Clojure backend and our Clojurescript frontend without having to change a single line of a trusted base.

Before we get started, we should mention that browser crypto is a bad idea if your goal is to fight the Man. It is, on the other hand, a good idea if the goal is to prevent spreading sensitive data across caches and microservices. For example, if we delete a key we could never read (it’s encrypted client-side), it is now rendered inert everywhere in our backend.

In the case of Balboa, we never want to see our users’ data and we love the literature about crypto providing elegant solutions to access control. Also, since Asm.js keeps getting faster and more available, this may pay greater dividends in future browser releases.

The dream: trust once, run anywhere

Our big question was the following, “Is it possible to take something developers already trust and call that as faithfully as possible from Clojure and Clojurescript?”

After some research, we had a gut feeling it would look something like the above. It seemed sensible enough, so we set to work finding suitable algorithms for the experiment.

After exploring some potential issues with Emscripten compilation (timing attacks, memset’s being removed silently, etc.), we went looking for a PRF algorithm that would be simple for emcc to compile faithfully. We decided on Skein/Threefish.

Skein’s NIST x86 implementation had no strange assembly instructions to worry about and had already gone through the review process of the SHA-3 competition. In a later post, we will talk about some of the tricks we used for NaCl and scrypt, which will cover tradeoffs around when to write Clojurescript or C.

The reality: uint8_t* as lingua franca

The first hurdle we encountered with JNI and Emscripten is what to do with structs. In the case of something like Skein, there is a struct, Skein1024_Ctxt_t, that contains all of the state for the pseudorandom function. This struct must be initialized and passed around for any incremental hashing operations.

In Emscripten, you can only pass number ( int, float, void*) or string. Also, in Java, the passing of objects through the JNI boundary is taxing.

One way to portably share code across the platforms was memcpy’ing, back and forth, structs into uint8_t*. Once a struct gets represented as a uint8_t* and passed up to Clojure and Clojurescript as either a Uint8Array or a byte[] respectively, it can no longer be sensibly mutated. Fortunately, all modifications happen in C, where the uint8_t* can be memcpy’d back into a struct before operations are done on it.

skein_shim.c wraps the initialization and teardown of these uint8_t* into structs so that the API exposed to both Emscripten and JNI is always uint8_t*. For Emscripten, HEAPU8, the heap in the Asm.js virtual machine, is of type Uint8Array. No translation of the sort done in JNI for byte[] to uint8_t* is required. Instead, a set call is required to bring the buffer that exists outside of Emscripten’s heap, into HEAPU8: a far simpler task.

Love means never having to OOM

If you want Emscripten code to run quickly, you generally have to set ALLOW_MEMORY_GROWTH=0 at Emscripten-compile-time, forcing you to work with a finite amount of heap. Calling malloc means calling free. Clojurescript offers some really elegant means for controlling your memory usage with Emscripten.

Like most cool things with Clojure(script), it involves a macro:

Now, in Clojurescript, we can do Emscripten memory management in a more friendly way than straight Javascript. The fun doesn’t just stop here: One could easily improve the above with a version that clears memory before returning it to HEAPU8, for the data that merits the treatment.

Calling from clojurescript

Now that the lower-level plumbing is in place, we can look at an example of making a C HMAC function callable from Clojurescript.

It involves three parts: First, how to wrap Emscripten-compiled functions in something callable from Javascript; second, how to properly handle the heap; and, third, how to wrap the Javascript function in something more Clojurescript friendly.

Above is nothing more than a simple wrapper for Module.cwrap(name, ret_type, arg_types), the mechanism for making functions callable from Javascript. We elected to make a map for translating data-types into the Emscripten representation, just because it is more explicit and easier for comparison against, say, a C header file. You’ll notice that long is a vector. This is because Emscripten represents 64-bit integers as two numbers, since, 2 53 is the largest safe integer representable in Javascript.

Manipulating data inside Emscripten requires copying data in and out of its own heap. We used the above methods for bringing Uint8Array back and forth.

For data being passed into the heap, the method is as follows: After malloc‘ing space, a slice of the whole HEAPU8 is set.

In order to take data out of Emscripten, data is copied out of the heap, back into a regular Uint8Array, by representing the “memory address” and range as a slice of HEAPU8, and copying it into a new buffer. The cloned array is returned, and the allocation of Emscripten heap is free’d upon returning.

Our goal was to make sure this copying back and forth was negligible with respect to the amount of work being done in Emscripten-space.

And, now, the payoff: We can generate the HMAC for a given message! About time, if you ask me.

Becker is a Founding Partner & CEO of PKC