Wednesday, April 14

Arg Parsing Micro-Benchmark

I'm working on a vector class for my shiny new Planar geometry Python library. When I was working on Lepton (which is mostly in C), I was keenly aware of the overhead of the generic arg parsing C-APIs (PyArg_ParseTupleAndKeywords and friends). Since this vector class largely consists of methods that do very little work, but may be called fairly often, I figured this overhead should be considered.

I implemented a class method to instantiate a vector with polar coordinates, which are converted to cartesian. Here it is in Python:
class Vec2(tuple):

    def __new__(self, x, y):
        return tuple.__new__(Vec2, 
            ((x * 1.0, y * 1.0)))

    @classmethod
    def polar(cls, angle, length=1.0):
        """Create a vector from polar coordinates"""
        radians = math.radians(angle)
        vec = tuple.__new__(cls, 
            (math.cos(radians) * length, 
             math.sin(radians) * length))
        return vec

Note, using tuple.__new__ in polar is an optimization that saves a layer of method calls and skips converting the x and y values to floats, since I know they are already. Running this through timeit with Python 3.1.2, I get:
>>> timeit.timeit('Vec2.polar(20, 10)', 
... 'from planar.vector import Vec2')
2.3068268299102783
>>> timeit.timeit('Vec2.polar(angle=20, length=10)', 
... 'from planar.vector import Vec2')
2.3426671028137207

Notice there's not much difference in timing between positional and keyword arguments.

Now let's implement the polar class method in C using generic arg parsing. Here's the method's code:
static PyObject *
Vec2_polar(PyTypeObject *type, PyObject *args, PyObject *kwargs)
{
    PlanarVec2Object *v;
    double angle;
    double length = 1.0;

    static char *kwlist[] = {"angle", "length", NULL};

    assert(PyType_IsSubtype(type, &PlanarVec2Type));
    if (!PyArg_ParseTupleAndKeywords(
        args, kwargs, "f|f:Vec2.polar()", 
        kwlist, &angle, &length)) {
        return NULL;
    }

    v = (PlanarVec2Object *)type->tp_alloc(type, 0);
    if (v != NULL) {
        angle = radians(angle);
        v->x = cos(angle) * length;
        v->y = sin(angle) * length;
    }
    return (PyObject *)v;
}

Here's the timing for the above:
>>> timeit.timeit('Vec2.polar(20, 10)', 
... 'from planar.cvector import Vec2')
1.045346975326538
>>> timeit.timeit('Vec2.polar(angle=20, length=10)', 
... 'from planar.cvector import Vec2')
1.578913927078247

This is certainly faster than the Python code, but not tremendously so, though that's not too surprising given the simplicity of this method. However, what is surprising is the performance difference between positional and keyword arguments. But, there it is.

Since this method takes two arguments, and they are both floats, maybe we can speed up the positional case, which should be common, by doing the arg parsing ourselves. We know that PyObject *args is a tuple, so let's take it apart manually and extract the angle and length. A slight complication is that the second length argument is optional, but we can handle it. Here's the beautiful result:
static PyObject *
Vec2_new_polar(PyTypeObject *type, PyObject *args, PyObject *kwargs)
{
    PyObject *angle_arg;
    PyObject *length_arg;
    PlanarVec2Object *v;
    int arg_count;
    double angle;
    double length = 1.0;

    static char *kwlist[] = {"angle", "length", NULL};

    assert(PyType_IsSubtype(type, &PlanarVec2Type));
    if (kwargs == NULL) {
        /* No kwargs, do fast manual arg handling */
        arg_count = PyTuple_GET_SIZE(args);
        if (arg_count != 1 && arg_count != 2) {
            PyErr_SetString(PyExc_TypeError, 
                "Vec2.polar(): wrong number of arguments");
            return NULL;
        }
        angle_arg = PyTuple_GET_ITEM(args, 0);
        if (!PyNumber_Check(angle_arg)) {
            PyErr_SetString(PyExc_TypeError, 
                "Vec2.polar(): expected number for argument angle");
            return NULL;
        }
        angle_arg = PyNumber_Float(angle_arg);
        if (angle_arg == NULL) {
            return NULL;
        }
        angle = PyFloat_AS_DOUBLE(angle_arg);
        Py_CLEAR(angle_arg);
        if (arg_count == 2) {
            length_arg = PyTuple_GET_ITEM(args, 1);
            if (!PyNumber_Check(length_arg)) {
                PyErr_SetString(PyExc_TypeError, 
                    "Vec2.polar(): expected number for argument length");
                return NULL;
            }
            length_arg = PyNumber_Float(length_arg);
            if (length_arg == NULL) {
                Py_DECREF(angle_arg);
                return NULL;
            }
            length = PyFloat_AS_DOUBLE(length_arg);
            Py_CLEAR(length_arg);
        }
    } else if (!PyArg_ParseTupleAndKeywords(
        args, kwargs, "f|f:Vec2.polar()", 
        kwlist, &angle, &length)) {
        return NULL;
    }

    v = (PlanarVec2Object *)type->tp_alloc(type, 0);
    if (v != NULL) {
        angle = radians(angle);
        v->x = cos(angle) * length;
        v->y = sin(angle) * length;
    }
    return (PyObject *)v;
}

So, if kwargs is NULL, then we extract the angle, and the length if present from the args tuple. We check if each are numbers, convert them to float objects (so you can pass in ints without complaint), then we extract the C double from each to store in the vector object's struct. Sure looks like a lot of code, let's see if it buys us anything:

>>> timeit.timeit('Vec2.polar(20, 10)', 
... 'from planar.cvector import Vec2')
0.46842408180236816 
Bam! Not bad at all! Of course if you do pass keyword args, we fall back to generic parsing, so that should perform the same as the last one:

>>> timeit.timeit('Vec2.polar(angle=20, length=10)', 
... 'from planar.cvector import Vec2')
1.5973320007324219
If I were feeling ambitious, I could extract the args from the PyObject *kwargs dict manually as well, and see some speedup. But at some point the added code and maintenance cost isn't worth it. I know the opportunity is there if needed, but also that passing args positionally is already a lot faster (a fact that is likely worth documenting).

I'm writing this planar library such that every class is coded in Python and C. Also the library is compatible with both Python 2.6+ and Python 3.x. If you're interested in an example of how to write a package that's compatible with Python 2 and 3 with C-extensions, check it out.

7 Comments:

Blogger Knarkles said...

Thanks for the tip, I didn't even think about optimizing arg parsing. I'll have to try that on PyEigen. :)

1:44 AM  
OpenID Tartley said...

This is really exciting to read about - I hadn't heard about Planar before.

What are the goals / expected uses of Planar? How will it compare with other such libraries out there - for example, I've used Shapely an amount in the past. I had a quick look in the docs directory on bitbucket, but couldn't find anything.

In particular, I'm dimly aware that Shapely, in performing geometric operations, uses the full dimensionally-extended nine-intersection model. (eg. the intersection of two adjacent polygons that share one edge will return a line). This is appropriate for GIS, from which Shapely comes, but a more naive area-of-overlap intersection would probably return an empty geometry in this case, which might be more appropriate (and faster!) for game-like applications.

Your thoughts much appreciated.

2:06 AM  
Blogger ΤΖΩΤΖΙΟΥ said...

As a side note:

import cmath
>>> help(cmath.rect)
Help on built-in function rect in module cmath:

rect(...)
rect(r, phi) -> z: complex

Convert from polar coordinates to rectangular coordinates.

>>> help(cmath.polar)
Help on built-in function polar in module cmath:

polar(...)
polar(z) -> r: float, phi: float

Convert a complex from rectangular coordinates to polar coordinates. r is
the distance from 0 and phi the phase angle.

5:36 AM  
Blogger casey said...

@Tartley - The goals of Planar are to supply a high-level, high-performance 2D geometry (math) api aimed at games and the like under a liberal license. Before I do a release, I intend to get the docs up to speed. I'll probably release 0.1 once Vec2 is complete and documented. It's foundational, of course, but pretty useful by itself.

What I intend to do differently than something like Shapely, is implement tightly-coded batch operations that can do a bunch of work in C without resorting to lots of looping from Python. The Vec2 class itself is just basic, to do batch operations they'll be things like PointArray and Region.

Planar will do generic intersections between shapes (I intend to support Point, PointArray, Rect, Poly, Circle, Ellipse, arbitrary Region, etc.). Being performance-minded, I suspect optimizations like you suggest will be useful, though I haven't thought about it at that level yet. I intend Region to be a lazy composite construct, so you can ask questions about it without necessarily fully realizing the operations that created it. Also there will be high-level apis around many-to-many collisions using algorithms like sweep and prune and GJK.

7:44 AM  
Blogger casey said...

@ΤΖΩΤΖΙΟΥ - I've (ab)used complex numbers as vectors before, and they are useful, though they are difficult to abstract within Python to the degree I want. In particular you cannot subclass them with much benefit because arithmetic operations on your subclass instances return complex instances rather than instances of your subclass. That means you have to override all of the arithmetic methods, basically negating the advantages of using it.

But, if you must stick with pure python and the stdlib, they are a good way to get fast vector math.

7:51 AM  
Blogger illume said...

Hi,

yeah, PyArg_ParseTupleAndKeywords is kind of slow. Nice avoidance of it :) Looks like a good speedup... in exchange for not so nice code. But worth it I think!

There are also METH_* arguments which can allow faster calls if you only allow positional arguments, and also functions with no arguments can be special cased. Not as usable though without keyword arguments. I think why people say the python C API is smoking crack is because of all the METH_ ;)

By the way, there is a C vector/math module being developed in pygame too over the last several months. Might be cool to join forces with the two libraries? Or maybe it might have a few useful bits in it for you.

http://svn.seul.org/viewcvs/viewvc.cgi/trunk/src/math.c?root=PyGame&sortby=date&view=log

Lepton is really shaping up with a bunch of goodies now! :)

cu.

8:17 AM  
Blogger casey said...

@illume - Yeah, I'm aware of the vector math work in pygame, I actually did a code review of it early on. Basically Vec2 in planar only the tip of the iceberg, and would be the only common point between us. There are a few other things that make me roll my own here

- Pygame is a monolith, and the wrong kind of dependency for what I intend to be a lightweight math library. I'm trying for a more modular approach with my Grease framework, which planar will be an integral part. And I don't want Grease to depend on Pygame.

- The LGPL is simply not for me. I understand the intent, but it isn't the way I roll.

- There are also some philosophical api differences, (i.e., their vectors are mutable).

And there's also the perverse pleasure I get from implementing it myself. Which is offset with a little "reinventing the wheel" guilt, but oh well.

8:37 AM  

Post a Comment

<< Home