Loopy lets you easily generate the tedious, complicated code that is necessary to get good performance out of GPUs and multi-core CPUs.

Loopy’s core idea is that a computation should be described simply and then transformed into a version that gets high performance. This transformation takes place under user control, from within Python.

It can capture the following types of optimizations:

  • Vector and multi-core parallelism in the OpenCL/CUDA model
  • Data layout transformations (structure of arrays to array of structures)
  • Loopy Unrolling
  • Loop tiling with efficient handling of boundary cases
  • Prefetching/copy optimizations
  • Instruction level parallelism
  • and many more

Loopy targets array-type computations, such as the following:

  • dense linear algebra,
  • convolutions,
  • n-body interactions,
  • PDE solvers, such as finite element, finite difference, and Fast-Multipole-type computations

It is not (and does not want to be) a general-purpose programming language.


See the Loopy Documentation.


Having trouble with Loopy? If that doesn’t help, maybe the nice people on the PyOpenCL mailing list can help.


Download Loopy here

(Note that there is an extra period in Loopy’s name on the Python package index, compared to its module name.)

Its git repository is available on

Prerequisites: PyOpenCL.

Loopy is licensed under the liberal [MIT license] (http://en.wikipedia.org/wiki/MIT_License) and free for commercial, academic, and private use.