Sunday, September 21, 2008

Python list (in Cython) vs. NumPy

Taking my previous benchmark a little further I decided to see how well iterating over a Python list of doubles compares with using NumPy arrays. Here is an extremely simple example that implements the sum function in Cython and compares the result with NumPy's sum method.

Here is the Cython code:

# --- csum.pyx ---
def csum(list array):
cdef int i, N=len(array)
cdef double x, s=0.0
for i in range(N):
x = array[i]
s += x
return s


Here is a setup.py to build it:

# --- setup.py ---
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
setup(cmdclass={'build_ext': build_ext},
ext_modules = [Extension("csum", ["csum.pyx"])])


To time it in IPython I created a simple file called test.ipy like so:

# --- test.ipy ---
import csum
import numpy
for i in [10, 100, 1000, 10000, 100000, 1000000, 10000000, 100000000, 250000000]:
print '-'*80
print 'N =', i
a = numpy.linspace(0, 1, i)
b = a.tolist()
print "Cython:",
%timeit csum.csum(b)
print "NumPy:",
%timeit a.sum()


I run it using IPython and here are the results (formatted a little):

N Cython NumPy
10 534 ns 10.1 micros
100 1.76 micros 10.8 micros
1000 15.3 micros 19.3
10000 150 micros 101 micros
100000 1.75 ms 933 micros
1000000 19.7 ms 9.24 ms
10000000 198 ms 92.1 ms
25000000 499 ms 231 ms


This was done on a P4 3 Ghz machine and clearly lists do quite well. At small sizes they outperform NumPy and at really large sizes they are about twice as slow. This is very good considering how general lists are.

6 comments:

Unknown said...

hi,

note that for 'silly benchmarks' like that, psyco is an easy alternative:

# --- psyco_sum.py ---
import psyco

@psyco.proxy
def csum(array):
s=0.0
for i in xrange(len(array)):
s += array[i]
return s
# EOF

I got some rather 'same ballpark-ish' results for psyco on my box...
(interestingly enough, proxying a function which just returns sum(x) is somewhat slower...)

cheers,
sebastien.

david said...

To me, it also shows how much better we can still get in numpy.

The numbers for big arrays are pretty good: a direct pure C implementation of sum takes ~ 8 cycles / item; but again, you and me have a pentium 4, which is particularly bad at floating point computation if you don't use SSE.

To get much better result for numpy, we would have to use assembly (summing through ATLAS already gives me 0.05 instead of 0.095 ms for the 1e7 long array, and my cpu is a P4 3.2 Ghz).

Unknown said...

You should replace

    for i in range(N):

by

    for i from 0 <= i < N:

which is much faster.

Nathan said...

FWIW this could be valuable in speeding up scipy.sparse.lil_matrix, which is painfully slow at the moment.

Leo Goodstadt said...
This comment has been removed by the author.
Leo Goodstadt said...

Came across your blog playing with Cython.
My experience has been the opposite. List performance is horribly slow.

Try running the cython with numpy code for comparison:

import cython
cimport numpy as np
import numpy as np

@cython.boundscheck(False)
def csum2(np.ndarray[double, ndim=1, mode="c"] parray):
cdef:
unsigned int i
unsigned N=len(parray)
double x, s=0.0
for i in range(N):
x = parray[i]
s += x
return s