Python performance the easy(ish) way
Ctypes is great. Let's start with a simple (and trite) example: Summing a range of numbers.
Here it is in python:
def sumrange(arg): return sum(xrange(arg))
Very nice! But let's say we're summing a LOT of numbers, like 0 through 10**8 (that's 100,000,000).
>>> %timeit sumrange(10**2) 1000000 loops, best of 3: 1.42 us per loop >>> %timeit sumrange(10**8) 1 loops, best of 3: 1.13 s per loop >>> %timeit sumrange(10**10) 1 loops, best of 3: 701 s per loop
Well that's no fun. Let's try something else:
def sumrange2(arg): x = i = 0 while i < arg: x += i i += 1 return x
How'd we do?
>>> %timeit sumrange2(10**2) 100000 loops, best of 3: 14.9 us per loop >>> %timeit sumrange2(10**8) 1 loops, best of 3: 16.5 s per loop
Wow... that's even worse... (I dare not attempt this one with 10**10) so how do we speed it up? (don't suggest math tricks... this is the the new world of computing!)
*edit*: yes I know there's a constant time algoritm, n*(n+1)/2 . That isn't the purpose of this post
how about ctypes?
sumrange.c
#include <stdio.h> unsigned long long sumrange(unsigned long long arg) { unsigned long long i, x; x = 0; for (i = 0; i < arg; i++) { x = x + i; } return x; }
compile it:
$ gcc -shared -Wl,-install_name,sumrange.so -o sumrange.so -fPIC sumrange.c -O2
and the python code...
import ctypes sumrange_ctypes = ctypes.CDLL('/path/to/sumrange.so').sumrange sumrange_ctypes.restype = ctypes.c_ulonglong
I learned on StackOverflow that you have to set the restype or else python will assume it's an int (which will overflow)
And the winner is...?
>>> %timeit sumrange_ctypes(10**8) 1000000 loops, best of 3: 590 ns per loop
Wow that's fast... too fast. Let's experiment
>>> %timeit sumrange_ctypes(10**2) 1000000 loops, best of 3: 601 ns per loop >>> %timeit sumrange_ctypes(10**10) 1000000 loops, best of 3: 592 ns per loop >>> %timeit sumrange_ctypes(10**16) 1000000 loops, best of 3: 602 ns per loop
It seems gcc has optized this into a constant time algorithm LOL. Let's try without the optimize flag (for the record I tried with -O1 and it was still contant time)
$ gcc -shared -Wl,-install_name,sumrange.so -o sumrange.so -fPIC sumrange.c
... and in python (ipython in my case)...
>>> %timeit sumrange_ctypes(10**2) 1000000 loops, best of 3: 807 ns per loop >>> %timeit sumrange_ctypes(10**8) 1 loops, best of 3: 214 ms per loop >>> %timeit sumrange_ctypes(10**10) 1 loops, best of 3: 3.01 s per loop
Roundup!
iterations | 10**2 10**8 10**10 -------------------------------------------------- pure python | 1.42 us 1.13 s 701 s ctypes | 807 ns 214 ms 3.01 s
That is a hell of a performance boost!
For nodejs hackers, the node equivalent of ctypes is FFI (Foreign Function Interface): https://github.com/rbranson/node-ffi