Here is the Cython code:

# --- csum.pyx ---

def csum(list array):

cdef int i, N=len(array)

cdef double x, s=0.0

for i in range(N):

x = array[i]

s += x

return s

Here is a setup.py to build it:

# --- setup.py ---

from distutils.core import setup

from distutils.extension import Extension

from Cython.Distutils import build_ext

setup(cmdclass={'build_ext': build_ext},

ext_modules = [Extension("csum", ["csum.pyx"])])

To time it in IPython I created a simple file called test.ipy like so:

# --- test.ipy ---

import csum

import numpy

for i in [10, 100, 1000, 10000, 100000, 1000000, 10000000, 100000000, 250000000]:

print '-'*80

print 'N =', i

a = numpy.linspace(0, 1, i)

b = a.tolist()

print "Cython:",

%timeit csum.csum(b)

print "NumPy:",

%timeit a.sum()

I run it using IPython and here are the results (formatted a little):

N Cython NumPy

10 534 ns 10.1 micros

100 1.76 micros 10.8 micros

1000 15.3 micros 19.3

10000 150 micros 101 micros

100000 1.75 ms 933 micros

1000000 19.7 ms 9.24 ms

10000000 198 ms 92.1 ms

25000000 499 ms 231 ms

This was done on a P4 3 Ghz machine and clearly lists do quite well. At small sizes they outperform NumPy and at really large sizes they are about twice as slow. This is very good considering how general lists are.

## 6 comments:

hi,

note that for 'silly benchmarks' like that, psyco is an easy alternative:

# --- psyco_sum.py ---

import psyco

@psyco.proxy

def csum(array):

s=0.0

for i in xrange(len(array)):

s += array[i]

return s

# EOF

I got some rather 'same ballpark-ish' results for psyco on my box...

(interestingly enough, proxying a function which just returns sum(x) is somewhat slower...)

cheers,

sebastien.

To me, it also shows how much better we can still get in numpy.

The numbers for big arrays are pretty good: a direct pure C implementation of sum takes ~ 8 cycles / item; but again, you and me have a pentium 4, which is particularly bad at floating point computation if you don't use SSE.

To get much better result for numpy, we would have to use assembly (summing through ATLAS already gives me 0.05 instead of 0.095 ms for the 1e7 long array, and my cpu is a P4 3.2 Ghz).

You should replace

for i in range(N):

by

for i from 0 <= i < N:

which is much faster.

FWIW this could be valuable in speeding up scipy.sparse.lil_matrix, which is painfully slow at the moment.

Came across your blog playing with Cython.

My experience has been the opposite. List performance is horribly slow.

Try running the cython with numpy code for comparison:

import cython

cimport numpy as np

import numpy as np

@cython.boundscheck(False)

def csum2(np.ndarray[double, ndim=1, mode="c"] parray):

cdef:

unsigned int i

unsigned N=len(parray)

double x, s=0.0

for i in range(N):

x = parray[i]

s += x

return s

Post a Comment