EngineeringFantasy

Rifts in the Community

Tuesday, 29 April 2014

I love Python. Its the first programming language that I really stuck to. At the time when I was first starting, Python had a lot going for it:

  • MIT taught its Introductory CS Course in Python
  • Python was widely used in the Scientific community
  • It was very intuitive and easy to use off the bat
  • The community on StackOverflow was super nice too

So, my journey began with MITx's Introduction to Computer Science course, and although it was the first time such a course was offered on edx, it was pretty darn awesome. The course was mostly taught by John Guttag. Most of the lectures made sense, and the homework questions were a lot of fun to complete. I made a lot of diverse friends too; one guy, a plumber but learning code so that he could help his son learn code (what an awesome dad!).

I never finished the course, but it gave me enough to get started with Python and programming. I started asking questions on StackOverflow, and soon started answering them too. I soon found one of the hidden gems of the site, its chat-room. I was enthralled by the community that grew around Python, it was and continued to be a super helpful community.

Diving Deeper

However, as I started diving deeper into things I soon came to realize a few of Python's limitations. First and foremost was that Python was very slow in some cases, especially when it came to raw loops or dealing with numbers. I personally was super grateful for Python's Integer promotion (helped me a lot with a couple of assignments and Project Euler too). However, I also realized there were many cases where Python can be optimised. The Python community responded to these problems by dividing the problem into different parts:

  • To make numerical data crunching (for lack of a better term) faster, the community came up with numpy.
  • For wrapping third party C libraries, there came swig by David Beazley and cython which is an evolutionary form of the Pyrex project. Please note that Cython is python specific whereas swig can be used with other programming languages as well. If you'd like to know more about these two libraries, I feel that this video does a good job at exploring what they are, and highlights the features that make them different.
  • Python Implementation, faster implementations have popped up like Pypy (a faster implementation of Python thanks to JIT). Jython/IronPython, both of which take advantage of their respective VMs, bypassing Python's GIL. Most recently, Pyston. There are others, but they have not gained enough traction.

Now, I can tell you that numpy (and Python's Data tools in general) was a huge success, and hence Python's scientific community has thrived. I can honestly say this with pride, that Python's data tools and libraries are arguably the best of the breed. PyData, which is a conference dedicated to the discussion of Python's data tools (and boy are there plenty of them) has been growing both in popularity and influence.

Challenges

Now the tools that the scientific community employ are third party libraries, they do not offer changes to the implementation of Python itself. However, the projects that are trying to make headway have two really big challenges, and the two are actually quite interconnected. So, what does making python faster mean? In essence there are two parts to it:

  • Making Python code itself execute faster
  • Better support for concurrency in Python

Now in order to meet the first goal, pypy implemented a JIT compiler and it WORKS! You can check it out yourself. Pyston on the other hand is trying to achieve JIT but through a different means (which has been attempted by pypy, but was not successful; it still remains a theoretical possibility). The first challenge here is make an implementation successful enough to replace CPython. I feeling that if the two camps, Pyston with its Dropbox backing and Pypy with its years of experience were to combine their efforts, then we might be able to see good results even faster. I know its just conjecture, but its something that might divide our community between two sides, ones that use pypy and the other which uses Pyston. Right now, Pyston is working towards supporting only Python 2.

Secondly, trying to get rid of the GIL has be a major problem (this is the reason why threading will often slow down your python programs), and as Alex Gaynor put it, a subtle reason for the division in our community:

I think there's been little uptake because Python 3 is fundamentally unexciting. It doesn't have the super big ticket items people want, such as removal of the GIL or better performance (for which many are using PyPy).

Most attempts at making Python faster work mostly with Python 2.7.*. Pypy 3 is not fully stable yet, and pypy-stm, which is pypy's attempt at removing the GIL (iirc) is aimed at Python 2 and not 3. Thus we come to the second big problem, and that is division between python 2 and 3.

Solutions

Pool our resources

We need to get the experts in the Python community together. We also need to get the companies that use python to back up a project that can make a python implementation fast enough to compete with Javascript's v8 engine, and one that supports concurrency well.

Admitting our problems

Everyone knows that Python 3 has issues. Every single time someone says "Unicode", I scream "Python 3" inside my head. Right now the Python community is in a state of denial regarding Python 3:

We need to get out of it, and start working words solving the problem.

Make it easier to contribute

Pythonistas want to help. We need to have more resources describing the problems that we face right now. Pypy project does an excellent job of that, and the effort they put into their FAQs are laudable with the exception of describing what rPython is:

RPython is a restricted subset of Python that is amenable to static analysis. Although there are additions to the language and some things might surprisingly work, this is a rough list of restrictions that should be considered. Note that there are tons of special cased restrictions that you’ll encounter as you go. The exact definition is “RPython is everything that our translation tool-chain can accept” :)

Resources

If you want to get started on understanding what these different technologies are, then these resources will help:

The Global Interpreter Lock (GIL)

David Beazley's talk on the GIL was pretty much the only talk I needed to get started, and (sadly) I did not do much more than that.

PyPy

Beazley's Keynote on Pypy was an excellent talk (if you haven't realized already, I adore Beazley's talks). He really dives into the guts on Pypy on this one, and shows you how it works from the inside.

Pypy and Software Transactional Memory is a good talk, but not the best. Its made by members of Pypy's core team, and introduces what STM is. Although this talk leaves a lot to be desired of, you can still walk away with some key points regarding what STM is and why its useful.

Pyston

This is a new attempt at making a faster python implementation, backed by Dropbox. You can take a look at their progress on their github project page.

Python 2 to 3

Armin Ronacher's blog post on porting from 2 to 3, really good stuff. He's been a strong critic of Python 3, but I feel that he justifies is position well and regularly.