WebGL GPU accelerated matrix operations

[UPDATE -- the WebGL spec has changed since this blog post, so the example will no longer work. However, there is now a WebCL standard in progress.]

A number of people have been talking about the possibilities for a “WebCL” — that is, an extension like WebGL that would allow general-purpose graphics-card-based computing from JavaScript, like OpenCL and its Nvidia-only predecessor CUDA allow from native client-based applications. Doing this would remove many of the bottlenecks people worry about when thinking of porting high-end 3D games to WebGL — imagine if all of your physics simulations could be offloaded to the graphics card.

Aaron Babcock, whose fallingsand WebGL demo I linked to last month, has gone ahead and implemented the first steps towards this, using shaders and render-to-texture to persuade the GPU to multiply two 1024×1024 matrices. This is really impressive; here’s his own description (with a few edits; errors are doubtless mine):

Technologies like CUDA and OpenCL have had good success accelerating matrix operations. Wouldn’t it be cool if the same accelerations could be accessible from JavaScript?

As a proof of concept I took the Sylvester matrix library and modified the multiply function to execute on the GPU. Matrix values are packed into textures, a glsl program computes the multiply to a framebuffer, and readPixels is used to retrieve the result. In my benchmarks stock Sylvester takes about 35 seconds to compute a multiply of two 1024×1024 matrices, the GPU-enabled version can do the same in about 5 seconds. Perhaps one day complex GPU-accelerated, distributed computing projects like seti@home could use only a simple webpage as a client.

I think it is a pretty cool proof of concept but it would be very interesting to get a discussion going from more experienced OpenGL developers. Here are some problems I see:

  • Only works on Webkit, locks up Firefox, Chromium produces incorrect results. [UPDATE it works for matrices smaller than 500x500 on Minefield now.]
  • Packing and Unpacking matrix values to textures (only handles integers now, could it be more efficient?)
  • Alpha channel in textures cannot be used to store matrix data, leaving only 3 bytes per pixel. [UPDATE: Aaron writes "I found that If I changed the alpha value, it affected all pixel values that resulted from the readPixel call" — some kind of alpha premultiplication going on, perhaps?]
  • If you want to use javascript for long running calculations, browsers will prompt the user every few seconds if they want to stop the script. Any way around this?

Here is an example page [don't visit it in Minefield], and here is a github link for the source.

Right now the demo is Webkit only, although the concept should be possible on any webgl implementation. I’d be very interested to know what you and other WebGL developers think.

Let me second that — it would be great to hear what people think about this.

You can leave a response, or trackback from your own site.

21 Responses to “WebGL GPU accelerated matrix operations”

  1. aa says:

    Very interesting, but OpenCL is much, much faster. Even JavaScript is at this point faster, with static typed arrays and not using objects (mjs).
    Maybe the browser can optimize those shaders?

    Anyway, I like more to have two new languages in your browser without having NaCl or anything like that, one statically typed scripting language like Google Go (golang.org) and a GPU relatad language like OpenCL, or just have a parameter to run scripting languages on a GPU. I believe with JIT-compiling, code can run even faster than C/C++ programs that are compiled more generic.

  2. AaronBabcock says:

    Right now the slowest part of GPGPU from javascript is the packing and unpacking of values to and from textures. You pay a high cost pushing data into the GPU and getting data out. Once you are on the GPU though computation should be exactly as fast as any other GPGPU approach out there (OpenCL/CUDA).

    The high cost of pushing data to and from the GPU should become less and less as you put more and more of the algorithm on the GPU, ie taking advantage of the ping pong technique. For instance, my fallingsand demo executes entirely on the GPU after initializing.

    Finally, I am hoping that somebody out there can take a look at my packing/unpacking approach and provide a faster, better way of doing it.

  3. You can work around the browser alerting about slow javascript by “yielding” via window.setTimeout.

    function myExpensiveComputation(intermediateResults) {
    if (intermediateResults) {
    // Initialize your calculation
    }
    var newResults = doSomeAmountOfWork(intermediateResults);
    if (!isComplete(newResults) {
    window.setTimeout(function(){myExpensiveComputation.apply(newResults);}, 0));
    } else {
    // use newResults
    }
    }

  4. I’m sure that’s not quite right (for example, it should be “call” and not “apply” if you’re not using the arguments object), but you get the idea.

  5. aa says:

    Well, at least you could optimize your script a little bit with caching, things like
    (i * width * bytesPerValue)
    and properties like
    bytes.length
    pixels.length

    JavaScript has also a built in slice method.

    Something like this will already be faster:

    var unpack = function(bytes){
    var value = 0;
    //ignore final byte which is always 255
    maxit = bytes.length – 1;
    power = maxit – 1;
    for(var i = 0; i < maxit; ++i){
    value += bytes[i] * Math.pow(256, power – i);
    }
    return value;
    }
    var elements = [];
    var pixellength = pixels.length;
    if (pixellength == undefined) {
    pixellength = pixels.data.length;
    pixels = pixels.data;
    }
    for(var i = 0; i < height; ++i){
    var row = [];
    var posrow = (i * width * bytesPerValue)
    for(var k = 0; k < width; ++k){
    var poscol = k * bytesPerValue;
    var begin = posrow + poscol;
    var end = begin + bytesPerValue;
    if(!( end <= pixellength))
    throw "dimensions wrong in unpackTexture";
    row.push(unpack(pixels.slice(begin, end)));
    }
    elements.push(row);
    }

  6. AaronBabcock says:

    Gavin,
    I think you are right that is the only approach to avoid the timeout dialog. Unfortunately that is going to break the sylvester interface. You can no longer do a simple matrixResult = matrixOne.multiply(matrixTwo). Perhaps that is a small price to pay for GPU acceleration though.

    aa,
    Those are good suggestion, I will I try and incorporate those optimizations soon.

    Do you think there is a better approach to packing values into textures. What I really want is to get floating point values from javascript to glsl. There are some bitwise operations I can use in glsl but I’m not sure how to get the floating point value from javascript into a texture. Any thoughts?

  7. aa says:

    You could calculate the 64 bit floating point

    1 bit sign
    11 bits exponent
    53 bits fraction (first only zero when 0.0)

    and store those bits in two pixels.

  8. AaronBabcock says:

    I was hoping something like that would be possible. Do you have any idea how to implement something like that? Does javascript have bitwise operators? I need to end up with 0-255 values that can be pushed into the imgData.data array which will become the texture sent to the gpu.

    If you have a code snippets or examples that would be really helpful. Thanks.

  9. aa says:

    http://www.exploringbinary.com/converting-floating-point-numbers-to-binary-strings-in-c/

    It is to an binary string, but i think it will be helpfull.

    JavaScript has indeed bitwise operators as documented here:
    https://developer.mozilla.org/En/Core_JavaScript_1.5_Reference/Operators/Bitwise_Operators
    Only be aware of the fact that an JavaScript int only has a precision of 53 bits (9e15)

    I knew indeed that canvas has 32 bits per pixel with each 4 * 8 bytes for rgba.

    Good luck:)

  10. aa says:

    Interesting javascript matrix optimization: https://bugzilla.mozilla.org/show_bug.cgi?id=543119

  11. giles says:

    @aa — sorry, the first time you tried to post that link it got stuck in the spam filter — I’ve approved it now.

  12. AaronBabcock says:

    aa,
    Thanks for the info, I’m going to look more at using floating point numbers all the way through, this weekend. If I can do it without a ridiculous slow down that would be pretty cool I think.

    By the way, has anyone else tried it in minefield? Sometimes it crashes for me on the 1024×1024 multiply, sometimes not. I think I will end up submitting a bug to mozilla.

  13. nemo says:

    WRT slow JS warnings.

    http://ejohn.org/blog/web-workers/
    ?
    “Normally in order to achieve any sort of computation using JavaScript you would need to break your jobs up into tiny chunks and split their execution apart using timers. This is both slow and unimpressive (since you can’t actually run anything in parallel – more information on this in How JavaScript Timers Work).

    With our current situation in mind, let’s dig in to Web Workers.
    Web Workers”

    Also supported in IE8, although of course no WebGL :)

  14. foobla says:

    all i get when i try the example at http://matrixmultiplygpu.appspot.com/sylvesterTest.html, is:
    uncaught exception: ERROR: 0:4: ‘i’ : Loop index cannot be initialized with non-constant expression ERROR: 0:4: ‘i’ : Loop index cannot be compared with non-constant expression

  15. giles says:

    Yup, the WebGL spec has changed a lot in the last 15 months and the example no longer works. Thanks for the heads-up, I’ve put an appropriate warning at the top of the post.

  16. psimon says:

    another spec or chrome implementation change:

    the “round” function is now a problem because its become a builtin.

    i renamed it to round3D and it again works.

  17. nemo says:

    Looks like this example has suffered from changes over time.

    Timestamp: 04/25/2012 05:22:27 PM
    Error: uncaught exception: Fragment shader failed to compile with the following errors:
    ERROR: 0:7: error(#229) Overloaded functions must have the same return type void
    ERROR: 0:7: error(#230) Overloaded functions must have the same parameter qualifiers inout
    ERROR: error(#273) 2 compilation errors. No code generated

  18. giles says:

    Hi nemo — yes, it has. I’m hoping someone else will produce a fixed one!

  19. PseudoPsycho says:

    Just a comment to the last point.
    “If you want to use javascript for long running calculations, browsers will prompt the user every few seconds if they want to stop the script. Any way around this?”
    You can use Web Workers to speed up the script and to prevent this kind of prompt.

Leave a Reply

Subscribe to RSS Feed Follow Learning WebGL on Twitter