Have you ever wondered what happens exactly when you run pip install? This post will give you a detailed overview of the steps involved in the past, and how it all changes with the adoption of PEP-517 and PEP-518.
In my previous post I’ve described how it’s possible to install three types of content: source tree, source distribution, and wheels. Only the last two types are uploaded to PyPI, the central Python repository. However, one could get its hands on a source tree (by feeding, for example, a git protocol for pip). The advantage of wheels over the others is that it does not require any build operation to happen on the user machine; it’s just downloading and extract.
Building python packages Link to heading
Now independent of where the build happens (user or the developer machine), you still need to build the package (either
the sdist or wheel). To do this, you need some builders in place. Historically, the need for third-party packages
manifested itself early on. Following the principle that Python has batteries included in the year 2000 with Python 1.6,
the distutils package was added to the Python standard library.
It introduced the concept of the setup.py
file containing the build logic and is triggered via python setup.py cmd
.
It allowed users to package code as libraries but did not have features such as declaration and automatic installation
of dependencies. Furthermore, its improvement lifecycle was directly tied to the core interpreter release cycle. In 2004
setuptools
was created, built on top of distutils
, and extended with other excellent features. It quickly became so
prevalent that most python installations started to provide it together with the core interpreter itself.
Back in those days, all packages were source distributions. Wheel distributions came along a lot later, in 2014. distutils was created back when only a few highly proficient people did the packaging. It is very flexible and imperative; you write a python script to modify every step in the package generation process.
The downside of this, though, is that it’s anything but easy to learn and understand. This started to become more and more an issue as Python grew in popularity and we had more and more users who were less proficient in the inner workings of Python.
build requirements Link to heading
For installing a source distribution pip mostly did the following:
- discover the package
- download the source distribution and extract it
- run
python setup.py install
on the extracted folder (does a build + install).
Developers did python setup.py sdist
to generate the distribution and python setup.py upload
to upload it to a
central repository (the upload command has been deprecated with 2013 in favor of the
twine tool most notably due to the upload using a non-secure HTTP connection, and
upload command also did a fresh build, not allowing the end-user to inspect the generated package before the actual
upload).
When pip ran the python setup.py install
, it did so with the python interpreter for which it was installing the
package. As such the build operation had access to all third-party packages already available inside that interpreter.
Most notably, it used exactly the setuptools version that was installed on the host python interpreter. If a package
used a setuptools feature available on a newer release than currently installed, the only way one could complete the
installation was to update first the installed setuptools.
This potentially can cause problems if a new release contained a bug that broke other packages. It is especially troublesome on systems where the users can’t alter installed packages. Then there was also the problem of what happens when the builder (e.g., setuptools) wants to use other helper packages, such as cython.
If any of these helpers were missing the build usually just broke with a failed to import package error:
File "setup_build.py", line 99, in run
from Cython.Build import cythonize
ImportError: No module named Cython.Build
There was no way to provide such build dependencies from the developers’ side. It also meant that users needed to install all packaging build dependencies even if they did not want to use that at runtime. To solve this issue PEP-518 was created.
The idea is that instead of using the host python with its currently installed packages for the build, the package provides the ability to be explicit about what they need for their build operation. Instead of making this available on the host python, we create an isolated python (think of a virtual environment) to run the packaging command.
python setup.py install
now becomes:
- create a temporary folder
- create an isolated (from the third-party
site-packages
) python environmentpython -m virtualenv our_build_env
, let’s refer to this python executable aspython_isolated
- install the build dependencies
- generate a wheel we can install via
python_isolated setup.py bdist_wheel
- extract wheel to site packages of
python
.
With this we can install packages that depend on cython
without actually installing cython
inside the runtime python
environment. The file and method of specifying the build dependencies is the pyproject.toml
metadata file:
[build-system]
requires = [
"setuptools>=44",
"wheel>=0.30.0",
"cython>=0.29.4",
]
Furthermore, this also allows for whoever does the packaging to be explicit about what minimum versions they require for the packaging and these can be quickly provisioned via pip transparently on the user machines.
The same mechanism can also be used when generating the source distribution or the wheel on the developers’ machine.
When one invokes the pip wheel . --no-deps
command that will automatically create in the background an isolated python
that satisfies the build systems dependencies, and then call inside that environment the python setup.py bdist_wheel
or python setup.py sdist
command.
packaging tool diversity Link to heading
Now there’s one more problem here, though. Note all these operations still go through the mechanism introduced twenty
years ago, aka executing setup.py
. The whole ecosystem still builds on the top of the distutils and setuptools
interface that cannot change much due to trying to preserving backward compatibility.
Also, executing arbitrary user-side Python code during packaging though is dangerous, leading to subtle errors hard to debug by less experienced users. Imperative build systems were great for flexibility twenty years ago when we were not aware of all the use cases. Still, now that we have a good understanding, we can probably make very robust and easy package builders for various use cases.
To quote Paul Ganssle (maintainer of setuptools
and dateutil
on this):
Ideally, the default option would be a declarative build configuration that works well for the 99% case, with an option to fall back to an imperative system when you need the flexibility. At this point, it’s feasible for us to move to a world where it’s considered a code smell if you find you need to reach for the imperative build options.
The biggest problem with
setup.py
is that most people use it declaratively, and when they use it imperatively, they tend to introduce bugs into the build system. One example of this: if you have a Python 2.7-only dependency, you may be tempted to specify it conditionally withsys.version
in yoursetup.py
, butsys.version
only refers to the interpreter that executed the build; instead, you should be using the declarative environment markers for your install requirements.
flit proved this assumption correct already with its introduction in 2015. It has
become the favorite packaging tool for many newcomers to Python, making sure new users avoid many foot guns. However, to
get to this point, flit
had to again build on top of distutils/setuptools, which makes its implementation
non-trivial, and the codebase quite a few shim layers (it still generates the setup.py
file, for example, for its
source distributions).
It’s time to free it from these shackles and encourage other people to build their packaging tools that make packaging
easy for their use cases, making setup.py
the exception rather than the default.
setuptools
plans to offer a setup.cfg
the only user interface to
lead the way, and when a PEP-517 system is in place, you should prefer that overusing the setup.py
for most cases. To
not tie everything back to setuptools
and distutils
and facilitate the creation of new-build backends
PEP-517 was created. It separates builders into a backend and frontend. The
frontend provides an isolated python environment satisfying all the declared build dependencies; the backend provides
hooks that the frontend can call from its isolated environment to generate either a source distribution or wheel.
Furthermore, instead of talking with the backend via the setup.py
file and its commands, we move to python modules and
functions. All packaging backends must provide a python object API that implements two methods
build_wheel and
build_sdist at the minimum. The API object point is specified via the
pyproject.toml
file under the build-backend
key:
[build-system]
requires = ["flit"]
build-backend = "flit.api:main"
The above code effectively means for the frontend that you can get hold of the backend by running the above code inside the isolated python environment:
import flit.api
backend = flit.api.main
# build wheel via
backend.build_wheel()
# build source distribution via
backend.build_sdist()
It’s up to the backend where and how they want to expose their official API:
- flit does it via
flit.buildapi
- setuptools provides two variants:
setuptools.build_meta
(on why read on later) - poetry does it via
poetry.masonry.api
With this, we can start having packaging tools that are no longer bound to the legacy decisions of the distutils
in
the frontend.
tox and packaging Link to heading
tox is testing tool and used by most projects to ensure compatibility against multiple Python interpreter versions of a given package. It also allows the quick creation of Python environments with the package under inspection installed in it, making reproducing failures faster.
To be able to test a package, it first needs to build a source distribution, though. While both PEP-518 and PEP-517
should make things better, turning them on can break packaging under some use cases. Therefore when
tox
added isolated build in version
3.3.0
decided not to enable it by default, for now. You need to enable it
manually (it will be turned on by default in version 4
sometime later this year - 2021).
Once you’ve specified a pyproject.toml
with appropriate requires
and build-backend
, you need to turn on the
isolated_build
flag inside tox.ini
:
[tox]
isolated_build = True
After this, tox in the packaging phase will build the source
distribution (by providing the build dependencies into an isolated python environment as per PEP-518). Afterward will
call the build backend as stated in PEP-517. Otherwise, tox will use the old way of building source distributions,
invoking the python setup.py sdist
command with the same interpreter tox is installed into.
Conclusion Link to heading
The Python Packaging Authority hopes that all this makes sense and will have more user-friendly, error-proof and stable builds. The specifications for these standards were written up and debated in long threads between 2015 and 2017. The proposals were deemed good enough to benefit the most, but some less mainstream use cases could have been overlooked.
If your use case is such, don’t worry the PEPs are open to enhancement at any point if we deem required. In my next post of this series here I’ll go over some of the pain points the community bumped into while releasing these two PEPs. These should serve as lessons learned and show that there’s still some work to be done. It’s not everything perfect yet. However, we’re getting better. Join the packaging community if you can help out, and let’s make things better together!