Thursday, July 17

Why is C faster than Python?

Questioning the obvious is something we do a lot accidently (and get chided for it), but occasionally it yields some useful insights. A while back, there was a discussion on the pygame mailing list about languages and speed. The poster was insistent that there was no good reason why C was faster than Python. Below is my explanation, which in writing I realized I had never voiced this way before:

When there is no better algorithm, then using a higher performing language is about the only option you have for better performance. There is an even better option, dedicated hardware, but that is far beyond the reach of most mere mortals. But that's the reason why we have things like SIMD instruction sets, GPUs and physics co-processors.

All this conjecturing about Python performance in relation to compiled languages makes me wonder. Why is the performance of C code not as good as that of well-written hand-coded assembler? Surely if the machine can do it, C can?

The problem boils down to one of specificity. Any language that tells the computer more specifically what to do can be faster than one that doesn't. Unfortunately, telling the computer what to do specifically is not a productive way to solve most problems. I'd rather not have to tell the computer which machine codes to execute, but if I did, and I was clever, I could make my program the fastest possible -- for a given specific architecture at least.

Python is more abstract and general than C, C is more abstract and general than machine code. Therefore machine code is potentially faster than C, and C is potentially faster than Python. And unless you reduce the generality of Python, it will always be possible to write faster programs in C, given a competent compiler.

JITs (e.g., psyco) can help mitigating this, but they too are ultimately constrained by Python's expressive power.

So unless you are proposing to reduce Python's expressive power, you are waging a losing battle. Until which time machine intelligence exceeds our own, it will always be possible for a human programmer to get higher performance from the lower-level language. And I further propose that when the time arrives that machine intelligence exceeds our own, this will not be the problem foremost on my mind ;^)


Anonymous Rachelle said...

Good for people to know.

2:32 PM  
Blogger Duncan McGreggor said...

"And I further propose that when the time arrives that machine intelligence exceeds our own, this will not be the problem foremost on my mind ;^)"

Ah, that was a classic line ;-) A nice laugh-out-loud moment...

2:58 PM  
Blogger swiftcoder said...

I would propose a slight counter-argument to this line of reasoning. In theory at least, the compiler/interpreter for a higher level language has much more latitude to optimise, and a better idea of what is going on.

For instance, the python interpreter knowns that the for statement is a loop, and can optimise accordingly.

The C compiler only knows that the for statement contains something loop-like - it could be a tree traversal, it could back-track, etc.

And finally, the assembler only sees a stream of instructions, and cannot optimise much (if at all).

10:51 AM  
Blogger casey said...

A performance challenge, particularly acute with with dynamic languages like python, however is that the same code can mean dramatically different things depending on the types of the inputs. This means it is not possible to generate machine code until runtime, and therefore any optimizations on that generated machine code, must also be incurred at runtime. Anyone who has waited for code to compile knows that this can be an expensive proposition. Of course in real life, the input types often do not vary, so you could in theory cache the machine code, but it still cannot be generated from only the information present in the code.

Of course CPython does not actually generate machine code at all, it interprets the instruction bytecode at runtime. Short of a magical CPU that can execute python bytecode as it's machine code, it is hopeless for such an interpreter to compete performance-wise.

Also, how would a theoretical python compiler know that a for statement was anything other than a simple loop any better than a C compiler could? There is no "tree traversal" abstraction at the language-level in Python anymore than in C.

You are totally correct about assembler, however I think you missed my point, which is that a suitably clever human can usually write faster special-purpose code in assembler than a compiler can generate. The labor and knowledge required with today's processors rarely makes this worthwhile, however. Though there is still a fair bit of hand-coded assembler written for performance critical code to take full advantage of special processor features, such as SIMD instructions, etc.

11:21 AM  
Blogger swiftcoder said...

Ja, my argument works much better if we are talking about python with a JIT. I haven't tried the newer JITs available for python, but I hear the performance benefits are startling.

As to the loops, a python for loop is in fact a foreach loop, and as such it only supports linear iteration.

The C for loop is far more general, and you can literally do anything with it - including things which involve no actually iteration at all.

For the foreach loop we can (in theory) implement very efficient iteration structures, which we aren't able to enforce on the more general for loop.

Of course, once you hit the real world, theory often goes out the window...

However, I do think the reason a human can write better optimised assembly, is because the human knows the intention behind the code, not just the direct meaning available to the compiler.

11:34 AM  
Anonymous Ian Mallett said...

I had also thought that Python was built on C, and so Python could never be faster than what it's built on.

What would be interesting is a "Python" compiler that compiles Python into C-style instructions--or possibly further into a lower level. Ideally, a compiler could expand the whole thing down to incredibly specific dealings. That way, when run, Python wouldn't suffer from having to interpret things--they'd be pre-interpreted.

11:33 PM  
Blogger casey said...

I know tinypy and shedskin can be used to translate Python to C++. And of course there's Cython, which is a Python dialect that generates C code.

9:05 AM  
Anonymous Anonymous said...

So why do Fortran compilers regularly produce faster numeric code than C?

10:08 AM  
Blogger casey said...

I would argue that fortran is not a higher-level language than C. In fact it is a language developed with scientific and mathematical applications in mind. To me that makes it more specialized, and therefore less general than C, which does not cater specifically to particular applications -- it disregards them all equally ;^)

10:03 AM  

Post a Comment

<< Home