Tag Archives: parallelism

OpenCL - the compute intermediate language

We are now fast approaching the yearly Game Developer Conference, and around this time last year my favourite topic of conversation was the need for a "virtual ISA" that would include the current and future processor architectures, particularly GPUs. The term "virtual ISA" implied an assembly-like language and toolset that could be used to generate high performance (most likely data parallel) code without being tied to a specific architecture. This is much the same as the "virtual ISA" that LLVM provides for a wide variety of (primarily) CPU architectures.

Even a year on, it remains such a simple idea that once stated it becomes a wonder that it still doesn't exist. The main change in conversation is that it has become clearer that this is actually the role OpenCL should attempt to fill.

Why do we need a virtual ISA, and why not another language?

Right now, the reality is that no one knows the best way to efficiently program the emerging heterogeneous architectures. In fact, I don't think we even understand how to program "normal" GPUs yet. C++ will inevitably be crowbarred into some form of working shape, but I'd rather not settle for this as a long term solution.

A compute software stack sounds like a far more attractive option. And to build a well functioning stack you will need a solid foundation. As an illustration of what occurs without one, you should observe the rather scary rise in the number of compiler-like software projects that generate CUDA C as their output. This use case is not something CUDA C is designed for, and as OpenCL is essentially a clone of CUDA C, you may well think that this trend is something future OpenCL revisions should pay attention to.

In fact, there are a few other interesting trends to note.

A change in the wind?

A significant strength of the CUDA architecture is PTX, the NVIDIA-specific assembly-like language that CUDA C compiles to. OpenCL does not yet have a parallel of this. Several independent little birdies tell me that the committee felt attempting to define this as part of the initial standard of OpenCL could have derailed the process before OpenCL had a chance to find its feet. However more birdies tell me that this view has now mostly changed, and that defining something akin to a virtual ISA could actually be back on the cards.

PTX as a unofficial standard?

As nice as PTX is, it is not a GPU version of LLVM. Current support is limited to a single PDF reference and an assembler tool. The very impressive GPU ray tracer OptiX uses PTX as its target, but the authors do have the significant advantage of residing inside NVIDIA HQ. The compiler toolkit that makes LLVM so attractive is missing. Not to mention that PTX only targets NVIDIA hardware - although this itself is an interesting area for change, and one where I feel NVIDIA could gain a lot of momentum. One project I will be keeping a close eye on is GPU-Ocelot, which appears to be going a fair way towards addressing the shortcomings of PTX. While PTX may not be an "official" standard, much like Adobe Flash, it could establish itself as one anyway.

LLVM as a data parallel ISA?

Given the close parallels, we should seriously question whether LLVM can support GPU-like architectures. As a reasonably mature project, LLVM already has a lot in its favour and can point to some successes in this area. Notably, its use in the Mac OpenGL implementation, the Gallium 3D graphics device driver project, and an experimental PTX backend. I spoke with a few people who I know have seriously considered this option and found the jury still out. I haven't use LLVM in sufficient anger to come to a decision.

One obvious obstacle is what LLVM would compile to? I would expect a serious cross-vendor virtual ISA to have direct support from all hardware manufacturers. A level of indirection through LLVM or any other third-party ISA is unlikely to gain sufficient ground if it's not explicitly supported by all vendors through a standard such as OpenCL.

Demand and evolution

With relatively little movement in the industry for a year or more, I do occasionally consider whether I have misread the demand for a virtual ISA. But not for long! Apart from the clear advantage for code generation and domain specific language implementations (a topic I'm very interested in), a virtual ISA should become the target bytecode for GPU and HPC languages and APIs such as OpenGl, OpenCl, DirectX, DirectCompute, CUDA and their successors. It's a long and growing list.

While uncertainty surrounds the best way to program your GPU we can expect to see unhealthy but unavoidable amounts of biodiversity. But if we want to prevent our industries from painfully diverging we should at least agree on a foundational virtual ISA that we start to unify behind and build on.