Compiling to the Web:

Getting Started With
asm.js and Emscripten


GDC 2014

Alon Zakai/@kripken,   Luke Wagner

asm.js


A strict subset of JavaScript, the industry-standard language that runs in all web browsers


Designed to be very easy to optimize and run close to native speed

Emscripten


An open source compiler from C/C++ to asm.js


Uses LLVM and clang


Emscripten & asm.js let you run your C/C++ code in web browsers, at high speed, without plugins

Emscripten & asm.js:
Growing adoption


Unity

Unigine

Minko

Torque 2D

Godot

Unreal

Nebula3

OpenFL

Cocos2D-X

Cube 2

etc.

It's pretty easy to run a C++
codebase on the web these days!


1. Use an engine that already supports Emscripten (Unity, Unreal, etc.)

2. Port your own in-house engine (IMVU, etc.)


Tools for this are open source, free, and mature


Porting your own code is what I'll focus on


Info should be useful also if you're using an engine someone else already ported


And in 2nd half of the talk, Luke will get into detail about asm.js and performance

Ok, I have some C++ code.
What do I do?

Hello world


  // helloworld.cpp
  #include <stdio.h>
  int main() {
    printf("hello, world!\n");
    return 0;
  }



  $ emcc helloworld.cpp
  $ node a.out.js
  hello, world!

Hello world


  $ emcc helloworld.cpp -o output.html
  $ ls output.*
  output.html output.js
  $ firefox output.html    # or any other modern browser

When we run...

  $ emcc helloworld.cpp -o output.html

...what actually happens?


We are running a cross-compiler: Generating code for another platform than the one you are currently on


Similar to developing a mobile game on a desktop!

Cross-compiling


Cannot use your system headers or libraries (they are OS-specific x86 binaries)


Code must be portable - no inline x86 assembly, etc.

Undefined Behavior


A possible issue with cross-compiling, for example:


  char buffer[8];
  int *i = (int*)&buffer[1]; /* unaligned! */
  *i = 10;

This works on x86, can fail on ARM


JavaScript, like ARM, will not produce correct results


emcc SAFE_HEAP option makes debugging this easy

Undefined Behavior


Build your project with -Werror, can catch many things!


  // src.c
  int main() {
    printf("hello, world!\n");
  }



$ emcc src.c -Werror
error: implicitly declaring library function 'printf'
  printf("hello, world!\n");
  ^
tests/hello_world.c:2:3: note: please include <stdio.h>

Event Loop


Code on the web must run in short events


Not returning control to the browser can lead to the dreaded "slow script dialogue" :(

Dialog boxes nicer today ;) but problem remains the same

Instead of this...



  int main() {
    init();
    while (is_game_running()) {
      do_frame();
    }
    return 0;
  }

...We need this



  #include <emscripten.h>

  void do_web_frame() {
    if (!is_game_running()) {
      emscripten_cancel_main_loop();
      return;
    }
    do_frame();
  }

  int main() {
    init();
    emscripten_set_main_loop(do_web_frame, 0, 0);
    return 0;
  }

Getting a Window, I/O, etc.


Emscripten supports a few C APIs for this:

HTML5, SDL, glut, glfw


Can use SDL, glut etc. for existing codebases


HTML5 is a good option for new codebases or if you know HTML5 already

html5.h



  #include <emscripten.h>
  #include <emscripten/html5.h>

  int main() {
    emscripten_set_canvas_size(1024, 768);
    emscripten_set_keydown_callback(0, 0, 1, key_callback);
    emscripten_set_main_loop(do_web_frame, 0, 0);
    return 0;
  }

SDL



  #include <SDL.h>
  #include <emscripten.h>

  int main() {
    SDL_Init(SDL_INIT_VIDEO);
    SDL_Surface *screen =
      SDL_SetVideoMode(1024, 768, 32, SDL_HWSURFACE);
    emscripten_set_main_loop(do_web_frame, 0, 0);
    /* use SDL_PollEvent to receive input etc. */
    return 0;
  }

Debugging


SDL, glut etc. are good to use because you can build the exact same code both to native and web


Can debug cross-platform issues normally on the native build

Debugging


For web-specific issues, can debug web builds directly


emcc has various levels of debuggability of generated code using the -g argument

Debugging


Optimized (-O2 and above) output is minified by default


  function a(a,b){a=a|0;b=b|0;f(a+b|0);}

Debugging


-g1 : preserve whitespace


  function a(a, b) {
    a = a | 0;
    b = b | 0;
    f(a + b | 0);
  }

Debugging


-g2 : preserve function names


  function _addAndPrint(a, b) {
    a = a | 0;
    b = b | 0;
    _printAnInteger(a + b | 0);
  }

Debugging


-g3 (or just -g) : preserve variable names


  function _addAndPrint($left, $right) {
    $left = $left | 0;
    $right = $right | 0;
    _printAnInteger($left + $right | 0);
  }


They will not always exactly match original variable names in source, but are often quite close

Debugging


-g4 : source maps


Show the C/C++ source code in your browser's debugger!


Works in Firefox, Chrome and Safari

Source Maps in Firefox

Source Maps in Chrome

Manual Debugging


Manual debugging works too, even easier than on native!



  function _addAndPrint($left, $right) {
    $left = $left | 0;
    $right = $right | 0;
    //---
    if ($left < $right) console.log('l<r at ' + stackTrace());
    //---
    _printAnInteger($left + $right | 0);
  }


Can add debug printouts that execute arbitrary JavaScript, show stack traces, etc.

Rendering: OpenGL


Best: The subset of OpenGL ES 2.0 that is parallel to WebGL :)


Basically ES 2.0 minus clientside data and a few other minor things

OpenGL Emulation Options


Ok: General OpenGL ES 2.0 including non-WebGL features (FULL_ES2 flag)


We emulate things like clientside data for you, adds some overhead


Bad: Older OpenGL 1.x stuff (LEGACY_GL_EMULATION flag)


Much works, but a lot doesn't, and emulation overhead is significant

OpenGL: summary


Use the WebGL-friendly subset of OpenGL ES 2.0


void do_web_frame() {
  /* normal GLES 2.0 code, no clientside data */
  glClear(GL_DEPTH_BUFFER_BIT);
  glBindBuffer(GL_ARRAY_BUFFER, ab);
  glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, eb);
  glVertexAttribPointer(1, 3, GL_FLOAT, GL_FALSE, 24, 0);
  glUseProgram(p);
  ..
  glDrawArrays(GL_TRIANGLE_STRIP, 0, n);
  ..
}

Fast, familiar, same code runs in native builds!

Now on to part 2!

Alon showed how easy it is to run your C++ on the web

How do we know it'll run well?

asm.js performance on large codes

Still, you may be wondering:

Does this performance depend on fancy, brittle, unreliable, secret, or unreproducible optimizations?

tl;dr: no

I'd like to explain why

The Anatomy of Emscripten output

The contents of the big .js file can be broken down into:

  • One giant asm.js module in the middle
    • Contains compiled C++ and some standard library
    • Adheres to the asm.js spec
  • Boilerplate code before and after
    • Contains the rest of the standard library
    • Connects the compiled code to the browser
    • Idiomatic JS, readable, and commented

The Anatomy of Emscripten output

The structure of the big asm.js module:


// The outer function acts as a module
function asmModule(global, imports, heap) {
  // Hint to the JS engine that this is asm.js (does not change behavior):
  "use asm";

  // Module imports, callable from compiled functions
  var puts = imports.puts;

  // C++ functions are compiled into JS functions
  function main() {
    puts(8); // printf("Hello, World!\n");
  }

  // Module export, callable from outside code
  return { main:main };
}

By encapsulating everything in a closure, asm.js provides invariants that the JS engine uses to optimize.

Type system basics

Variables have types that are "declared" by their initializer


var d = 3.14;   // d : double
var e = 0.0;    // e : double

var i = 42;     // i : int
var j = 0;      // j : int
          

Type system basics

Formal parameters are "declared" by coercions


function f(d, i) {
  d = +d;   // d : double
  i = i|0;  // i : int
  ...
}
        

Type system basics

Assignments cannot cross types without explicit coercion


function f(d, i) {
  d = +d;      // d : double
  var i = 0;   // i : int

  // i = d;    // type error
  i = ~~d;     // ~~ : double -> int
  d = +i;      //  + : int -> double
}

Type system basics

Type errors are printed as warnings in the console:

test.js:7:8 warning: asm.js type error: double is not a subtype of int

Error or not, the code always behaves according to the JS spec

Console also reports success:

Successfully compiled asm.js code (total compilation time 115ms; stored in cache)

Type system basics

Argument types must match "declared" parameter types.


function f(i) {  // f : int -> void
  i = i|0;
}

function g() {
  var d = 3.14;  // d : double
  var i = 42;    // i : int

  // f(d);       // type error
  f(i);
}

This means that parameter coercions are effectively no-ops.

For example, Firefox generates this code for the call to f:

mov    $0x2a,%edi
callq  <f>

Arithmetic

The definition of addition in JS looks slow:

11.6.1 The Addition operator

The production
  AdditiveExpression : AdditiveExpression + MultiplicativeExpression
is evaluated as follows:

 1. Let lref be the result of evaluating AdditiveExpression.
 2. Let lval be GetValue(lref).
 3. Let rref be the result of evaluating MultiplicativeExpression.
 4. Let rval be GetValue(rref).
 5. Let lprim be ToPrimitive(lval).
 6. Let rprim be ToPrimitive(rval).
 7. If Type(lprim) is String or Type(rprim) is String, then
   a. Return the String that is the result of concatenating ToString(lprim)
      followed by ToString(rprim)
 8. Return the result of applying the addition operation to ToNumber(lprim)
    and ToNumber(rprim).

Arithmetic

If we know we have two numbers, the definition reduces to:

 8. Return the result of applying the addition operation to lnumber and rnumber.

However, "number" means IEEE754 double addition.

Arithmetic

Given two integers x and y, we have the algebraic identity


ToInt32(double(x) + double(y)) = x + y mod 2^32

which means that


// x : int
// y : int
(x + y)|0
can be compiled to a single instruction
addl %eax, %ebx

without any type or overflow checks.

Arithmetic

To capture this rule in the type system, asm.js has a type "intish" which means "must be coerced before use".

          +,- : (int, int) -> intish  OR
                (double, double) -> double

|,&,^,~,<<,>> : (intish, intish) -> signed
            ~ : intish -> signed
          >>> : (intish, intish) -> unsigned

These types are related by a subtyping relation:

int <: intish
signed <: int
unsigned <: int

Arithmetic

For example,


function f(i, j, k) {
  i = i|0;
  j = j|0;
  k = k|0;
  return (((i + j) & k) + j)|0;
}

type checks and generates this code in Firefox:

add    %esi,%edi
and    %edx,%edi
mov    %edi,%eax
add    %esi,%eax
retq

Arithmetic

Multiplication poses a problem since:


ToInt32(double(x) * double(y)) != x * y mod 2^32

This is a general problem, so ES6 added Math.imul which does simple int32 multiplication.

Math.imul has been implemented in Safari, Chrome, Firefox.

Arithmetic

In asm.js, Math.imul can be imported:


function asmModule(global) {
  "use asm";
  var imul = global.Math.imul;
  function f(i, j) {
    i = i|0;
    j = j|0;
    return imul(i, j)|0;
  }
  ...
}

which generates the expected:

imul  %esi,%eax

Float

Using ES6 Math.fround, asm.js also has a float type:


function asmModule(global) {
  "use asm";
  var fround = global.Math.fround;
  function f(x) {
    x = fround(x);         // x : float
    var y = fround(3.14);  // y : float
    return fround(x + y);  // also -, *, /
  }
  ...
}
        

which generates this code in Firefox:

movss  0x11fd(%rip),%xmm1  # 3.14
addss  %xmm1,%xmm0
retq

Other features hopefully coming to ES6/7:

  • Bitwise operations: Math.clz32, Math.popcnt32, ...
  • Types uint64, int64, ...
  • SIMD operations:
    • Types float32x4, uint32x4, ...
    • Operations float32x4.add, mul, min, shuffle, ...
    • Intel working on implementation in both Firefox and Chrome

Heap access

The C++ heap is represented by a big typed array.


function asmModule(global, foreign, heap) {
  "use asm";
  var HEAP32 = new global.Int32Array(heap);
  var HEAPF32 = new global.Float32Array(heap);
  var HEAPF64 = new global.Float64Array(heap);
  ...
}
        

Heap access

Heap load:


function g(i) {
  i = i|0;
  return HEAP32[i>>2]|0
}
        

This has the effect of masking off the low bits, hence the alignment warning in Alon's talk.

In the future, asm.js could include DataView to avoid the alignment restriction.

Heap access

Generated code (64-bit) in Firefox:

and    $0xfffffffc,%edi
mov    (%r15,%rdi,1),%eax

The alignment mask may be eliminated via GVN.

Using 4GB reserved region of the virtual address space combined with SIGSEGV handler to avoid bounds check.

Heap access

Generated code (32-bit) in Firefox:

and    $0xfffffffc,%eax
cmp    $LENGTH,%eax
jae    ...
mov    0xBASEADDR(%eax),%eax

If the asm.js code uses explicit masking:


  HEAP32[(i & MASK) >> 2]
  

the compile code doesn't need a bounds check:

  and    $COMBINED_MASK,%eax
  mov    0xBASEADDR(%eax),%eax

(Emscripten doesn't do this now, but may add an option later.)

Function pointers

Function pointers implemented using per-signature tables:


function asmModule() {
  "use asm";

  function f0(i) { i = i|0; ... }
  function f1(i) { i = i|0; ... }

  function g(i) {
    i = i|0;
    TBL[i&1](42);      // 1 = TBL.length - 1
  }

  var TBL = [f0, f1];  // length must be a power of 2

  return g;
}

Which generates this code in Firefox:

and    $0x1,%edi
lea    0xTABLE_OFFSET(%rip),%rcx
mov    (%rcx,%rdi,8),%rax
mov    $0x2a,%edi
callq  *%rax

Compiler optimization

Emscripten generates asm.js from LLVM IR after LLVM has done front- and middle-end optimizations:

  • Function inlining
  • Algebraic expression simplification
  • Branch folding
  • Redundance elimination
  • Scalar replacement of aggregates
  • etc.

The JS engine is effectively replacing only the LLVM backend

Compiler optimization

Modern JS engines have advanced compiler backends:

  • SSA-based
  • Optimization passes (GVN, LICM, DCE)
  • Arch-specific codegen optimizations
  • Register allocation
    • Currently, LSRA variants
    • Preparing to use LLVM-derived algorithm in Firefox

Ahead of time compilation

asm.js code can be reliably compiled before running

This compilation strategy has several advantages:

  • Avoid in-game compilation pauses
  • Easy to compile functions in parallel
  • Easy to do all parsing/compilation asynchronously
  • Straightforward to cache the compiled code

This is what Firefox does. Read more at blog.mozilla.org/luke.

Multi-threading

We know we need pthreads.

Basic strategy: share an ArrayBuffer (heap) between Web Workers (threads).

Experimental implementation in progress; hope to demonstrate on real game engines soon.

Standardization will take time, but we're seeing growing support.

In summary

Nothing fancy, just a C/C++ compiler where the backend runs in the client

Questions?