Computing, Cycling, horticulture, other stuff

Go faster with the same CPU…just throw the work to the GPU

by guestblogger “Beardy”

Most of you will be aware of nVidia’s CUDA technology. Essentially nVidia is making a play for the hearts and minds of the HPC community by giving them a more bang for their buck without buying mainframes.

If you recognise the following, Opteron, UltraSPARC, EMT64, POWER, CBE or Cell, then you are probably in the market for the PC-universe equivalent of a turbocharger, enter nVidia’s CUDA technology. If you have PCI-E bus available and can cope with C, then have we got a deal for you! Have a read of the links at the end of the article for more details.

As always, the reviewers at Toms Hardware focus on the raw processing capabilities (which are impressive), but also as usual, because of the limited time for testing they didn’t encounter (or at least report on any encounter) with the downsides of using CUDA. Aside from the need to ensure better cooling for the GPU than its maker probably supplied, there are some other little considerations that need to be accommodated.

There is no such thing as a free lunch. From personal experience I can say that while CUDA-enabled code churns massive maths functions in a fraction of the time as the system CPU (nVidia 8600GT with 512MB in a Pentium IV H/T 3.2GHz with 3GB) for BOINC work, there is a noticeable trade-off. When the system is grinding away on some intensive function, the graphics display refresh is mind numbingly slow…. think back to old EGA-era ISA-bus cards in max res….ouch!

There is also an interesting problem with coding for CUDA-enabled cards. Ignoring the fact that only nVidia make ‘em, there are some unusual scenarios that arise. Those of us who loathe programming using try-catch-exception paradigm would rarely encounter it, but if your compiler still builds on an underlying function library that employs it, you will still hit it.

Specifically, consider the following scenario;  application is written as multi-threaded, multi-processor for grid-enabled operation. Bleeding-edge stuff for most developers. Now break from the SMP architecture and depend on DMA or some other memory block handover technique for moving datasets in and out of the modules.  What happens if an exception is thrown in a CPU (or GPU) that is not within the same OS or system management processor ?  Unless the code is written to specifically handle these cases, death spirals occur…. you know the kind, “Dialog *this* has thrown an exception and need to terminate. Press Ok to abort”… wash, rinse, repeat… oops… um, sorry, how did you plan to handle the exception if the exception handler is the bit going *boom*. Uh oh.

The above scenario occurs rather regularly in a FORTRAN program module (running in the host CPU) that links to a C module (that runs in the GPU), with the unfortunate outcome that the CPU-side module keeps expecting a result block from the C block that never arrives (because the process has terminated unexpectedly), leaving memory invalid and throwing a termination code that “looks” ok, but in fact is not. The root cause in this case derives from a complex math divide-by-zero error that does not have a denominator-check. No biggie *IF* the system were operating under an SMP architecture, but oops, no we ain’t.

“Your mileage may vary” is an understatement. CUDA is truly amazing for certain areas of endeavour, but like all solutions, there is no such thing as a magic bullet. More importantly, on the back of Microsoft’s missive banishing “memcpy()” to the bin, the QUALITY of code is critical to delivering any improvement.

“CUDA-Enabled Apps: Measuring Mainstream GPU Performance”
<http://www.tomshardware.com/reviews/nvidia-cuda-gpgpu,2299.html>

“Use your NVIDIA GPU for scientific computing”  (BOINC)
<http://boinc.berkeley.edu/cuda.php>

“NVIDIA CUDA Compute Unified Device Architecture”
<http://developer.download.nvidia.com/compute/cuda/1_1/NVIDIA_CUDA_Programming_Guide_1.1.pdf>

Can you Digg it?:
  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • Sphinn
  • Pownce
  • Mixx
  • Reddit
  • Technorati
  • Slashdot
  • TwitThis
  • Furl
  • Live
  • Ma.gnolia
  • NewsVine
  • ThisNext
  • YahooMyWeb
  • BlogMemes
  • Fark
  • Yahoo! Buzz

Comments are closed.