Loopy lets you easily generate the tedious, complicated code that is necessary to get good performance out of GPUs and multi-core CPUs.

Loopy's core idea is that a computation should be described simply and then
*transformed* into a version that gets high performance. This transformation
takes place under user control, from within Python.

It can capture the following types of optimizations:

- Vector and multi-core parallelism in the OpenCL/CUDA model
- Data layout transformations (structure of arrays to array of structures)
- Loopy Unrolling
- Loop tiling with efficient handling of boundary cases
- Prefetching/copy optimizations
- Instruction level parallelism
- and many more

Loopy targets array-type computations, such as the following:

- dense linear algebra,
- convolutions,
- n-body interactions,
- PDE solvers, such as finite element, finite difference, and Fast-Multipole-type computations

It is not (and does not want to be) a general-purpose programming language.

# Documentation

See the Loopy Documentation.

# Support

Having trouble with Loopy? Maybe the nice people on the Loopy mailing list can help.

# Download

See also the Installation section of the Documentation.

(Note that there is an extra period in Loopy's name on the Python package index, compared to its module name.)

A link to a prebuilt binary is also available from the front page of loopy's documentation.

Its git repository is available on

Prerequisites:

See conda forge for prebuilt packages of islpy and PyOpenCL.

Loopy is licensed under the liberal [MIT license] (http://en.wikipedia.org/wiki/MIT_License) and free for commercial, academic, and private use.