Like last year, I had the honor of being invited to present PyCUDA and PyOpenCL along with a few examples of their use to a great crowd at Nvidia's inaugural GPU Technology Conference 2010.

Please click the following link to view the slides: pycuda-pyopencl-gtc-2010.pdf.
Update: Nvidia has posted a recording of the session. There's also a full list of sessions, with many talks that are worth being watched. In particular, I'd like to recommend the ones by Bryan Catanzaro on Copperhead, which is built on top of PyCUDA, by Tim Warburton on all things GPU-based discontinuous Galerkin. Also check out the poster on Atomic Hedgehog by Cyrus Omar.
At the recent PyCon Quattro, which took place in early May in the beautiful Tuscan city of Florence, Fabrizio Milo gave a talk on PyCUDA entitled
PyCuda: Come sfruttare la potenza delle schede video nelle applicazioni python (PyCUDA: How to make use of the power of graphics cards in Python applications)
He made a set of rather nice slides (in English), which may be of interest. They are downloadable in PDF form at the link.
Thanks Fabrizio for taking the time to talk about PyCUDA!
Quite often, I hear complaints that coding for GPUs is difficult. In response to such comments, I believe that, for correct perspective, the discussion needs to be framed somewhat differently.
First of all, squeezing the last drop of performance out of modern CPUs is hard, too. Here's a nice article on cache effects by Igor Ostrovsky that explains some of the phenomena one needs to take into account and the surprising things that can happen.
It just appears to me that on the CPU, fewer people care about good performance, whereas for GPUs, you admit that you do care simply by your choice of architecture. Not caring about CPU is not entirely unreasonable--you are somewhat likely to get 'average' performance even without detailed analyses. On the GPU on the other hand, carelessly written code is not as likely to perform well.
So, in summary, my belief is that both CPUs and GPUs can be equally difficult to understand, it's just that the potential payoff of caring about performance is much greater on one than on the other.
In my opinion, GPU computing is significant because I--as a grad student--can easily afford a machine that allows me to perform a simulation like the following in 40 minutes instead of a whole workday. That's why.
If you're curious, this shows the density of a vortex shedding flow behind a square obstacle at Re=100 and Ma=0.1. The attentive viewer may notice a sound wave at the beginning as the system settles from uniform flow to flow around the obstacle, as well as the passing of a gentle density "nudge" intended to throw the system off balance and accelerate the onset of shedding. This was computed using my Discontinuous Galerkin solver hedge on an Nvidia GTX 260.
This work owes a lot to Hendrik Riedmann from IAG, Uni Stuttgart who wrote the initial version of the Navier-Stokes operator in hedge.
(Btw: did you notice how the movie cleverly avoids the typical criticism of being "CFD"--colorful fluid dynamics? :-)
Nicolas Pinto, Yunsup Lee, Bryan Catanzaro, Paul Ivanov, Ahmed Fasih and I have recently submitted an article that explains how PyCUDA allows the user to do run-time code generation ("RTCG"), and how that is an enormous boon to implementation efforts of most high-performance codes. Among many other things, PyCUDA also underlies our efforts to bring discontinuous Galerkin PDE solvers onto the GPU.
Get it while it's hot: Arxiv, Brown SC
Update: Fixed arXiv link.
High-performance scientific computing has recently seen a surge of interest in heterogeneous systems, with an emphasis on modern Graphics Processing Units (GPUs). These devices offer tremendous potential for performance and efficiency in important large-scale applications of computational science. However, exploiting this potential can be challenging, as one must adapt to the specialized and rapidly evolving computing environment currently exhibited by GPUs. One way of addressing this challenge is to embrace better techniques and develop tools tailored to their needs. This article presents one simple technique, GPU run-time code generation (RTCG), and PyCUDA, an open-source toolkit that supports this technique.
In introducing PyCUDA, this article proposes the combination of a dynamic, high-level scripting language with the massive performance of a GPU as a compelling two-tiered computing platform, potentially offering significant performance and productivity advantages over conventional single-tier, static systems. It is further observed that, compared to competing techniques, the effort required to create codes using run-time code generation with PyCUDA grows more gently in response to growing needs. The concept of RTCG is simple and easily implemented using existing, robust tools. Nonetheless it is powerful enough to support (and encourage) the creation of custom application-specific tools by its users. The premise of the paper is illustrated by a wide range of examples where the technique has been applied with considerable success.
This past week, I had the honor of presenting a talk on PyCUDA at Nvidia's inaugural GPU Technology Conference.

Please click the following link to view the slides: pycuda-nvidia.pdf.
Update: Nvidia has posted a recording of the session that you may watch or download.
Update 2: Giancarlo Colasante has transcoded the above video into just 16 MB. You may download the resulting video here.
Is it possible to take the instrumentals from one track, the vocals from another, and come up with something that you'd actually want to listen to? Turns out yes:
Amazing what computers can do these days. (via Holger Levsen)
The SciPy'09 conference ended less than a week ago. At the invitation of the SciPy'09 organizers (especially Fernando Perez), Nicolas Pinto gave a talk in the Advanced Tutorials track on how to use PyCUDA to do GPU scripting.
First, I would like to use this opportunity to publicly thank Nicolas for all the work and time he put into making this tutorial a reality. Second, I would like to point out the video of his session, which you can watch below:
As a math person, you're often faced with the task of communicating about math. Unfortunately, most modern means of communication, be it email, the web or instant messages, aren't really suited to typing math. Fortunately, however, many of these means do allow the use of Unicode, and Unicode allows for certain limited forms of mathematical typography.
Putting Unicode formulas together usually requires a fair amount of patience and some quality time with your favorite character map application. But now there's an easier way: The Unicode Input Helper--or "UIH". Here's an image of it in action:

Using it, you may use HTML entity names with backslashes (such as \int for an integral) to put together the basics of a formula, and then use a searchable list of all known Unicode characters to add the finishing touches. The screenshot gives you an idea. Once you've finished your masterpiece, simply use your computer's copy-and-paste function to get it to where it needs to be.
(Like PuDB, uih requires Ian Ward's urwid library.)
As a side benefit, I think uih makes for a nice replacement for pretty much every character map program--but its original purpose was easy typing of math.
I'm happy to introduce PuDB, a full-screen, console-based visual debugger for Python that I recently cooked up.
Or install it simply by typing
easy_install pudb
into your Unix shell. Here's a screenshot of it in action:

Python has had decent debugging support for a while now, in the form of
But I felt that there was a gap between these offerings--Pdb being very austere, and Winpdb and the IDEs being rather heavyweight. I wanted a comfortable debugger that's easily usable in a shell and doesn't require me to touch my mouse. PuDB uses Ian Ward's excellent Urwid library for its interaction with the console.
Update: Looks like PuDB is slowly growing a community. There's now a mailing list to host discussions.