I'm working on a vector class for my shiny new
Planar geometry Python library. When I was working on Lepton (which is mostly in C), I was keenly aware of the overhead of the generic arg parsing C-APIs (
PyArg_ParseTupleAndKeywords and friends). Since this vector class largely consists of methods that do very little work, but may be called fairly often, I figured this overhead should be considered.
I implemented a class method to instantiate a vector with polar coordinates, which are converted to cartesian. Here it is in Python:
class Vec2(tuple):
def __new__(self, x, y):
return tuple.__new__(Vec2,
((x * 1.0, y * 1.0)))
@classmethod
def polar(cls, angle, length=1.0):
"""Create a vector from polar coordinates"""
radians = math.radians(angle)
vec = tuple.__new__(cls,
(math.cos(radians) * length,
math.sin(radians) * length))
return vec
Note, using
tuple.__new__ in
polar is an optimization that saves a layer of method calls and skips converting the x and y values to floats, since I know they are already. Running this through timeit with Python 3.1.2, I get:
>>> timeit.timeit('Vec2.polar(20, 10)',
... 'from planar.vector import Vec2')
2.3068268299102783
>>> timeit.timeit('Vec2.polar(angle=20, length=10)',
... 'from planar.vector import Vec2')
2.3426671028137207
Notice there's not much difference in timing between positional and keyword arguments.
Now let's implement the
polar class method in C using generic arg parsing. Here's the method's code:
static PyObject *
Vec2_polar(PyTypeObject *type, PyObject *args, PyObject *kwargs)
{
PlanarVec2Object *v;
double angle;
double length = 1.0;
static char *kwlist[] = {"angle", "length", NULL};
assert(PyType_IsSubtype(type, &PlanarVec2Type));
if (!PyArg_ParseTupleAndKeywords(
args, kwargs, "f|f:Vec2.polar()",
kwlist, &angle, &length)) {
return NULL;
}
v = (PlanarVec2Object *)type->tp_alloc(type, 0);
if (v != NULL) {
angle = radians(angle);
v->x = cos(angle) * length;
v->y = sin(angle) * length;
}
return (PyObject *)v;
}
Here's the timing for the above:
>>> timeit.timeit('Vec2.polar(20, 10)',
... 'from planar.cvector import Vec2')
1.045346975326538
>>> timeit.timeit('Vec2.polar(angle=20, length=10)',
... 'from planar.cvector import Vec2')
1.578913927078247
This is certainly faster than the Python code, but not tremendously so, though that's not too surprising given the simplicity of this method. However, what is surprising is the performance difference between positional and keyword arguments. But, there it is.
Since this method takes two arguments, and they are both floats, maybe we can speed up the positional case, which should be common, by doing the arg parsing ourselves. We know that
PyObject *args is a tuple, so let's take it apart manually and extract the angle and length. A slight complication is that the second
length argument is optional, but we can handle it. Here's the beautiful result:
static PyObject *
Vec2_new_polar(PyTypeObject *type, PyObject *args, PyObject *kwargs)
{
PyObject *angle_arg;
PyObject *length_arg;
PlanarVec2Object *v;
int arg_count;
double angle;
double length = 1.0;
static char *kwlist[] = {"angle", "length", NULL};
assert(PyType_IsSubtype(type, &PlanarVec2Type));
if (kwargs == NULL) {
/* No kwargs, do fast manual arg handling */
arg_count = PyTuple_GET_SIZE(args);
if (arg_count != 1 && arg_count != 2) {
PyErr_SetString(PyExc_TypeError,
"Vec2.polar(): wrong number of arguments");
return NULL;
}
angle_arg = PyTuple_GET_ITEM(args, 0);
if (!PyNumber_Check(angle_arg)) {
PyErr_SetString(PyExc_TypeError,
"Vec2.polar(): expected number for argument angle");
return NULL;
}
angle_arg = PyNumber_Float(angle_arg);
if (angle_arg == NULL) {
return NULL;
}
angle = PyFloat_AS_DOUBLE(angle_arg);
Py_CLEAR(angle_arg);
if (arg_count == 2) {
length_arg = PyTuple_GET_ITEM(args, 1);
if (!PyNumber_Check(length_arg)) {
PyErr_SetString(PyExc_TypeError,
"Vec2.polar(): expected number for argument length");
return NULL;
}
length_arg = PyNumber_Float(length_arg);
if (length_arg == NULL) {
Py_DECREF(angle_arg);
return NULL;
}
length = PyFloat_AS_DOUBLE(length_arg);
Py_CLEAR(length_arg);
}
} else if (!PyArg_ParseTupleAndKeywords(
args, kwargs, "f|f:Vec2.polar()",
kwlist, &angle, &length)) {
return NULL;
}
v = (PlanarVec2Object *)type->tp_alloc(type, 0);
if (v != NULL) {
angle = radians(angle);
v->x = cos(angle) * length;
v->y = sin(angle) * length;
}
return (PyObject *)v;
}
So, if
kwargs is
NULL, then we extract the angle, and the length if present from the
args tuple. We check if each are numbers, convert them to float objects (so you can pass in ints without complaint), then we extract the C
double from each to store in the vector object's struct. Sure looks like a lot of code, let's see if it buys us anything:
>>> timeit.timeit('Vec2.polar(20, 10)',
... 'from planar.cvector import Vec2')
0.46842408180236816
Bam! Not bad at all! Of course if you do pass keyword args, we fall back to generic parsing, so that should perform the same as the last one:
>>> timeit.timeit('Vec2.polar(angle=20, length=10)',
... 'from planar.cvector import Vec2')
1.5973320007324219
If I were feeling ambitious, I could extract the args from the
PyObject *kwargs dict manually as well, and see some speedup. But at some point the added code and maintenance cost isn't worth it. I know the opportunity is there if needed, but also that passing args positionally is already a lot faster (a fact that is likely worth documenting).
I'm writing this planar library such that every class is coded in Python and C. Also the library is compatible with both Python 2.6+ and Python 3.x. If you're interested in an example of how to write a package that's compatible with Python 2 and 3 with C-extensions,
check it out.