In a previous article we began speculation about the impact that the new Nvidia Tesla boards might have on the proliferation of HPC in the manufacturing space, and this started discussion about the difficulties of programming GPUs. This week I had the chance to "sit down"( we were standing and walking in the Nvidia booth, actually) with Sumit Gupta of Nvidia to talk a bit about Cuda programming.Cuda is a library that adds keywords to C to express parallelism, making it a potentially more intuitive model for parallel programming. "It will not make parallel programming easy," claimed Gupta, "but it does make it easier". Like many other vendors selling parallel programming tools and languages, Nvidia has their stories about high school kids successfully using them as proof that they are easy. I am not buying that as a valid measure. I know high school students who can do things even I can't with a computer. I say give the tools to an old school C programmer on the plant floor. If he can use them, THEN they are easy ;-)
It is important to note up front that the Cuda/Tesla partnership is most suited to problems that are embarrassingly parallel in nature and can be easily spawned over 30,000 threads. The speed gain by this pair is in hiding memory latency;if you have too few threads, the memory latency remains and speed gains are not realized. This means that if your threads will be sharing data at all, you will end up with traffic collisions and will not see any gain.
If your problem, like Helen of Troy, can launch a thousand threads, then the advantage of Tesla is that it was built to handle the thread management for you. As Gupta said, "launch as many threads as you can and let the hardware handle it". There is a very active user forum at the CudaZone where people are sharing code, examples and advice. The forums are populated not only by users, but also by Nvidia employees who help users solve problems and learn the programming environment.
Although some of you are ubergeeks who enjoy hacking code and will probably show up on the Nvidia forums, for most plant floor folks, this new technology is not really useful until it shows up in COTS software. At the show this week, Ansys was demonstrating a mechanical library that had been ported to CUDA and The Portland Group announced that the latest version of their PGI tool would include "Provisional support for x64+GPU on 64-bit Linux for CUDA-enabled NVIDIA GPUs using the new high-level PGI Accelerator Compilers programming model".
Like all other IT solutions, this is not a silver bullet, and the ecosystem of software support is still young, but it is still an area to watch closely- it may even have an application in your plant today.