Here is the Cython code:
# --- csum.pyx ---
def csum(list array):
cdef int i, N=len(array)
cdef double x, s=0.0
for i in range(N):
x = array[i]
s += x
return s
Here is a setup.py to build it:
# --- setup.py ---
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
setup(cmdclass={'build_ext': build_ext},
ext_modules = [Extension("csum", ["csum.pyx"])])
To time it in IPython I created a simple file called test.ipy like so:
# --- test.ipy ---
import csum
import numpy
for i in [10, 100, 1000, 10000, 100000, 1000000, 10000000, 100000000, 250000000]:
print '-'*80
print 'N =', i
a = numpy.linspace(0, 1, i)
b = a.tolist()
print "Cython:",
%timeit csum.csum(b)
print "NumPy:",
%timeit a.sum()
I run it using IPython and here are the results (formatted a little):
N Cython NumPy
10 534 ns 10.1 micros
100 1.76 micros 10.8 micros
1000 15.3 micros 19.3
10000 150 micros 101 micros
100000 1.75 ms 933 micros
1000000 19.7 ms 9.24 ms
10000000 198 ms 92.1 ms
25000000 499 ms 231 ms
This was done on a P4 3 Ghz machine and clearly lists do quite well. At small sizes they outperform NumPy and at really large sizes they are about twice as slow. This is very good considering how general lists are.
6 comments:
hi,
note that for 'silly benchmarks' like that, psyco is an easy alternative:
# --- psyco_sum.py ---
import psyco
@psyco.proxy
def csum(array):
s=0.0
for i in xrange(len(array)):
s += array[i]
return s
# EOF
I got some rather 'same ballpark-ish' results for psyco on my box...
(interestingly enough, proxying a function which just returns sum(x) is somewhat slower...)
cheers,
sebastien.
To me, it also shows how much better we can still get in numpy.
The numbers for big arrays are pretty good: a direct pure C implementation of sum takes ~ 8 cycles / item; but again, you and me have a pentium 4, which is particularly bad at floating point computation if you don't use SSE.
To get much better result for numpy, we would have to use assembly (summing through ATLAS already gives me 0.05 instead of 0.095 ms for the 1e7 long array, and my cpu is a P4 3.2 Ghz).
You should replace
for i in range(N):
by
for i from 0 <= i < N:
which is much faster.
FWIW this could be valuable in speeding up scipy.sparse.lil_matrix, which is painfully slow at the moment.
Came across your blog playing with Cython.
My experience has been the opposite. List performance is horribly slow.
Try running the cython with numpy code for comparison:
import cython
cimport numpy as np
import numpy as np
@cython.boundscheck(False)
def csum2(np.ndarray[double, ndim=1, mode="c"] parray):
cdef:
unsigned int i
unsigned N=len(parray)
double x, s=0.0
for i in range(N):
x = parray[i]
s += x
return s
Post a Comment