Migrating from Python 2 to Python 3

At Intempus we are using Python 2.7 and when i started, i decided that we should migrate to Python 3 at some point. As the Python 2 support is discontinued from 2020, now is a great opportunity to make the migration. We use the django framework which will not support Python 2 anymore as well. So yesterday evening i started looking for how other people have tackled the migration problem. Luckily in Intempus we have quite a lot of unit tests and pretty good coverage which obviously makes it easier to refactor the code. If you do not have unit tests and good coverage you should start there.

Is it feasible?

Dropbox has a huge codebase of Python and are migrating it incrementally from 2 to 3 – they are now running only on Python 3. Instagram also migrating their huge codebase from 2 to 3. So we know it is possible. So i started with looking at how big the codebase in Intempus is running “find . -name ‘*.py’ | xargs wc -l” yields 146731 lines of code. Which was more than i expected. Then i looked at how many files we had – 1121 files. Which gives us ~130 lines of code per file. A rough estimation is that it would take 10 minutes to refactor a file on average to be Python 3 compliant which yields ~186 hours of work which in Denmark in a working week of 37 hours are 5 weeks. So maybe not so rough. But what about external libraries?

External libraries

Getting more excited about the move to Python 3 i read the whole night about different people opinions and their experiences about how to migrate. I ended up liking the “The Conservative Python 3 Porting Guide” the most. (https://portingguide.readthedocs.io/en/latest/ ) He also raises the problem with external libraries. So the strategy i think to go with is as follows:

  1. If the library has a Python 3 version it should be no problem
  2. The library is Python 2 only – if it is open source we can port the library if it is feasible
  3. Convert to another library that is Python 3 compatible

So this can be a lot of work depending on the libraries. So we will have to investigate that.

Python 3 compatible Python 2 code

While investigating our external libraries we can work on our own codebase. The porting guide mentioned earlier has a detailed overview of how to write Python 3 compatible Python 2 code. So we want to create CLI that can be used by developers which will scan their code and tell them if there is something that is not valid Python 3 code. The CLI should have a flag that will give suggestion on how the developer can solve the problem. When we have the tool we need a guide on how to use it. Now instead of having to port everything at once we will take it along the way.

For our continous integration we want a way to check if the code checked in is Python 3 compatible. There are some different opportunities and i have not decided which one yet. An interesting one is using static analysis to compare analysis when the code is run in Python 2 vs. running the code in Python 3. In that way we should see if they look the same. If they do we are somewhat closer to running the same program in Python 2 and Python 3. I say somewhat because Python is a dynamic language so we can not be 100 % sure using static analysis.

Next steps

  • Investigate external libraries
  • Prepare tooling for supporting developers in writing Python 3 compatible code
  • Get developers into the Python 3 switch and read about the differences between the two versions

Sources

  • https://blogs.dropbox.com/tech/2019/02/incrementally-migrating-over-one-million-lines-of-code-from-python-2-to-python-3/
  • https://portingguide.readthedocs.io/en/latest/
  • Thalmann