## Python meets C: Cython

By Tim on Thursday 17 November 2011, 22:35 - code - Permalink

This bit is about optimizing code Python code using something that's closer to the metal (i.e. a CPU/GPU). Before you do anything about optimization, realize this:

Early
Optimization is the root of all evil

- Donald Knuth

If you still think you need to optimize your code, read on.

A few years ago I used weave for improving some parts of my code. Back then I was annoyed by the rather sparse documentation. It seems some things changed now, and Cython (wiki) seems like a good candidate.

Cython is a method of including C-like code directly into Python scripts. In some cases, more fine-grained control over how the program is executed can give you a significant speed increase. Typically, Cython can give a 100x–1000x speed increase. See for example this (dated) overview. With Cython you can get near-C speeds while retaining the flexibility of Python code, which you can see below.

I stumbled across a nice read on Cython about speed optimisation with Cython. I encourage you to read the post yourself, but here are the impressive results of Python, NumPy and Cython:

## Pure Python

from numpy import zeros from scipy import weave dx = 0.1 dy = 0.1 dx2 = dx*dx dy2 = dy*dy def py_update(u): nx, ny = u.shape for i in xrange(1,nx-1): for j in xrange(1, ny-1): u[i,j] = ((u[i+1, j] + u[i-1, j]) * dy2 + (u[i, j+1] + u[i, j-1]) * dx2) / (2*(dx2+dy2)) def calc(N, Niter=100, func=py_update, args=()): u = zeros([N, N]) u[0] = 1 for i in range(Niter): func(u,*args) return u

## NumPy implementation

def num_update(u): u[1:-1,1:-1] = ((u[2:,1:-1]+u[:-2,1:-1])*dy2 + (u[1:-1,2:] + u[1:-1,:-2])*dx2) / (2*(dx2+dy2))

## Cython code

This one is called 'Faster Cython' on the linked blogpost

#cython: boundscheck=False #cython: wraparound=False cimport numpy as np def cy_update(np.ndarray[double, ndim=2] u, double dx2, double dy2): cdef unsigned int i, j for i in xrange(1,u.shape[0]-1): for j in xrange(1, u.shape[1]-1): u[i,j] = ((u[i+1, j] + u[i-1, j]) * dy2 + (u[i, j+1] + u[i, j-1]) * dx2) / (2*(dx2+dy2))

which is imported into the Python program with

import pyximport import numpy as np pyximport.install(setup_args={'include_dirs':[np.get_include()]}) from _laplace import cy_update as cy_update2

## Performance results

Method Time (sec) Relative Speed Pure Python 560 250 NumPy 2.24 1 Cython 1.28 0.57 Weave 1.02 0.45 Faster Cython 0.94 0.42

(sorry for the crappy 'table' but it's impossible to search anything on
dotclear
formatting because all the documentation is in French. Which of course is the
best language in the *world*, times a thousand. In fact, Chuck Norris
probably spoke French...</rant>)

So NumPy is already quite fast, but you can squeeze some extra performance out of your CPU when you use Cython, while you can still use Python-like code and don't have to care about the memory management of NumPy arrays etc.

## References

- A presentation (Google quickview) on Python optimisation
- Speed comparison between Python, NumPy, Matlab and Fortran. It's done by NASA so it must be true.
- NumPy vs Matlab