3  Robust codebases: about safety and inspection

Anyone with some experience coding will have uttered the words “ok, finished coding, time to start debugging!”, before starting a labyrinthinc search for why your code isn’t behaving as you expected, that can last much longer than the time it took to initially creating it. It is both weird and worrying that computer code is rarely expected to work right away, but such is the human condition: we are absolutely incapable of writing bug-free code, and AI seems to be just as good as we are at inserting little bugs that can be hard to track. This chapter covers tools related to code safety and quality control, basically anything that allows you to avoid bugs as much as possible, or at least find them easily.

❯ Level 1

Use a proper debugger: pdb

The first step is to get rid of your “print-debugging” habits, this famous technique of inserting statements like print("I am here") or print("Still alive") in your code to find where it is breaking. I know “it works”, but taking 10 minutes to learn proper debugger tools can save you hours of adding and removing those statements. The first tool we will see is the command-line debugger pdb, which is bundled with all python distributions, because you can use it anywhere, and because it will force us to understand the fundamentals. We will cover however afterwards the graphic debugger of VSCode that can be more handy if available.

To launch it, the best way is to run your script while inserting -m pdb between the python word and the script name. For example, if you are executing a script with arguments, simply run:

python -m pdb my_script.py arg1 arg2

This will launch a debugging session that awaits instructions (for the moment, the script is not running). That is where you should learn the following basic commands of the pdb program: - r or run: will start the script over and run it until the first breakpoint or until it finishes. - b line_number or break line_number: will set a breakpoint at the line number line_number of the script. When the script reaches this line, it will stop and you will be able to inspect the variables. You can also set a breakpoint in a function by using break function_name (the function must have been already imported), or in another file by using break file_name:line_number. - c or continue: when you are stopped at a line, will continue running the script until the next breakpoint or until it finishes. - n or next: will run the next line of the script, and stop again. If the line is a function call, it will execute the whole function call. - s or step: similar to next but if there is a function call, it will go inside the function and stop at the first line of the function. - p var or print var: will print the value of the variable var at the current point in the code. Note that only the variables that are in the current scope are accessible, to access variables of the function that called the one you are stopped at for example, you will need to use up first). - w or where: will show you where you are in the code, in terms of “call stack” (i.e. the list of function calls that stand between the main script and the current line of code you are stopped at). This comes in very handy because you can use it after an exception is raised to see exactly where it happened. - up and down: will move you up and down the call stack, so you can access variables of the function that called the one you are stopped at, and go back down afterwards.

This is pretty much all you will need! The only thing I can recommend now is to start trying it on your code, and familiarize yourself with inserting breakpoints, moving around the code, and inspecting variables. Apart from the insertion of breakpoints, pdb is extremely easy to use and can shorten a debugging session by a lot.

A little technical addition about breakpoints, the hard part is to add them to files that are not in the same directory as the main script you are running, and in that case you should use the syntax break file_path:line_number, where file_path is a relative path from a directory that is in the sys.path. For example, if it is imported from the main script with from module.submodule import submodule_file you would add a breakpoint in submodule_file.py with break module/submodule/submodule_file:line_number. Final note: pdb does sadly not handle multi-process or multi-threaded code.

Use a proper debugger: VSCode debugger

A much more friendly way to do debugging if you have the occasion is to use an IDE debugger like the one in VSCode. Inserting breakpoints becomes trivial, one can simply click on the left of any line of code until a red dot appears, meaning that a breakpoint is inserted. One can debug a script by directly going to Run > Start Debugging or going to the insect with a play button part of the left toolbar. Once debugging, variables and the call stack are directly available on the left of your window, buttons at the top give you the possible to continue, move to the next line, etc., while the bottom part of your window will have both the output, and a “debug console” where you can type commands if you want to examine variables more in detail (for example a specific element of an array). From this debug console you can even generate plots in the middle of a debugging session! Note also that the “Data Wrangler” extension can give you additional tools to examine dataframes or arrays visually during debugging. You can find videos on the VSCode website to get a more visual explanation.

The only tricky part is to debug scripts with arguments. The gist is you have to create a debug configuration in the form of a launch.json file which is suggested in the Debug panel below the “Run and Debug” big button. There you can choose the “Python Debugger” then “Python file with Arguments” options and you will see a new launch.json file in your window (technically, it is created in the .vscode hidden folder of your project. If you have opened an individual file with VSCode and not a project folder, this will not work). It should look like this:

{
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [

        {
            "name": "Python Debugger: Current File with Arguments",
            "type": "debugpy",
            "request": "launch",
            "program": "${file}",
            "console": "integratedTerminal",
            "args": [
                "${command:pickArgs}"
            ]
        }
    ]
}

I won’t enter into the details, but you can simply not touch anything, go to the tab with the script you want to debug, and there start the debugging session (green play button on the left or Run menu > Start Debugging). You will see a prompt asking you for the command line arguments you wanted to pass, and will need to enter them everytime you start a debugging session. This actually corresponds to the ${command:pickArgs} placeholder in the args variable of the json above. You can also replace this "args" value with a list of arguments if you don’t want to be prompted everytime. Similary, "program": "${file}" means the session will start with the file you are currently looking at, but it can be replaced with a script name. Note however that this config is valid for the whole project, so if you need to debug another script in your project, you will have either to change this config or add a new configuration and select it in the dropdown menu next to the green play button, at the top of the debug panel.

Alas, this “detail” of adding command-line arguments is not straightforward, but the rest of the VSCode debugger should be really intuitive to use, and I hope this will help you use it even in those annoying cases. Even if it’s still difficult, dealing with these configs is the type of problems AI tools like copilot can help a lot with, and I wouldn’t be suprised if it was soon trivialised. To end up this paragraph, I recomment you to try a right-click on the space where breakpoints are inserted, and try out the possibilities to add “conditional breakpoints” and “logpoints”. Finally, note that the VSCode can handle multi-process and multi-threaded code as opposed to pdb.

Logging

We all like to add print statements to our code to sometimes report on notable events (“saved a file here”), on things that we should be cautious about but not enough to raise an error (“this line of the data was not as expected”), or give hints as to how fast a program is advancing (the typical “iteration {i}/{n}” for every i out of 100, or the “time elapsed: {time.time() - t_start}” at the end of a long function call). This is all fine as long as you are the only one using your code, but it can become problematic as it grows larger and other team members add their own statements, or it can just be confusing for end users. A more graceful solution is to:

  • use print() statements only for information that you want to convey to all end users,
  • use the logging module that we are goin to see now for everything else.

Essentially, the logging module gives fancy “print-like” statements that can be configured to be displayed in the console, written to a file, sent to a server, or partially silenced depending on their level of importance, and that will add information for you (a timestamp and the function and line number from where they are triggered notably).

The best way to use it is to create and configure a logger object at the beginning of each python file, and then use its different methods to log information. For example:

import logging

# This creates a logger object that knows the name of the file it is in (__name__)
logger = logging.getLogger(__name__)

# ... Doing stuff
logging.debug("This is a message only for debugging.")
logging.info("This is a not-so-important message.")
logging.warning("This is a message that warns about something.")
logging.error("This is a message that indicates an error that is not critical enough to stop the program.")

If you run such a script, you will see that by default, only the warning and error messages are shown. This is easy to configure by adding at the beginning of your script the line logging.basicConfig(level=logging.DEBUG) or logging.basicConfig(level=logging.INFO) to show all messages down to the “info” or “debug” level. The basicConfig allows you to set some other useful parameters. Here is a recommended one:

import logging
import time

logging.basicConfig(
    level=logging.DEBUG,
    filename=f"logs/{time.strftime('%Y%m%d_%H%M%S')}.log",
    format="%(asctime)s - %(levelname)s - %(filename)s:%(funcName)s:%(lineno)s - %(message)s",
    datefmt="%Y-%m-%d %H:%M:%S",
)

Without entering in the details, this config will log to a file instead of the command line, that will be in the logs/ directory (which must already exist) and have the date and time of the execution, great if you want to look at your records later! Of course that’s only useful if you want to keep those logs, otherwise you can just remove the filename parameter and the logs will be displayed to the terminal.

The other very interesting line is the format one, that adds information for each call to a logging function. In this case the exact time (formatted by the specification in datefmt), the leve of the message (from “DEBUG” to “ERROR”), the name of the file, the function, and the line number where that call was triggered, and finally the actual message that was passed to the logging function. All the variables that are available in the logging format string can be found in this paragraph of the documentation. This is all much more informative than your standard print statement!

Moreover, only one call to basicConfig needs to be made, typically in the entrypoint scripts of your codebase, and all the other files will inherit the configuration and only need to have the line logger = logging.getLogger(__name__) at the beginning. This line however is mandatory in every script that logs anything.

Random seeds

It is quite frequent in scientific code to use random numbers, but this poses an obvious problem when debugging or quality control, as well as for reproducibility of certain results. This is why it is recommended to always give the possibility to use a random seed whenever some code uses randomness. This can be done at the beginning of your program, for the different functions that provide pseudo-random numbers. For example, with this code:

import random
import numpy as np

seed = 123

def seed_everything(seed):
    random.seed(seed)
    np.random.seed(seed)

if __name__ == "__main__":
    seed_everything(seed)
    print(random.random())
    print(random.choice(list(range(1000))))
    print(np.random.randn(2, 2))

You will see that the program gives the same output every time you run it, but that will be different if you change the seed. Note that the outputs will also be different if you change the order of the calls to the random functions: that’s because behind the scenes, the random module and numpy are using each one a random number generator that changes its state every time a new random number is generated. This state can be set with the seeding methods at the beginning, but it also ends up depending on the history of calls made to random functions.

A different approach is to use the object-oriented interface of these modules: they each provide an object that represents and holds that internal state. This is useful if you want to have different random states in different parts of your code, or if you want to have a more explicit way to handle randomness. Here is an example:

import random

seed = 123
rng = random.Random(seed)
rng2 = random.Random(seed)
np_rng = np.random.default_rng(seed)

if __name__ == "__main__":
    print(rng.random())
    print(rng2.random())
    print(np_rng.randn(2, 2))

Now if you add another call to rng.random() before rng2.random() is called, you will see it has no impact on the result of rng2.random() because they are using different internal states. This can be useful as your codebase grows and you have different modules that can be called or not depending on other parameters. For example, say you are training a neural network, and it has to be randomly initialized but you also have an option to train on a random subset of data. By separating the random number generator objects for the network and dataset modules you will ensure reproducibility even when adding the data subset feature.

❯❯ Level 2

Use a linter

You probably heard often that code style is important for readability, and you may even know a few “guidelines” that are generally respected by the community of a given language: for example, in python, these include the use of snake_case for function and variables names but CamelCase for class names, the importance of leaving spaces around operators, etc. Most of these rules are gathered in style guides like the famous PEP8 for Python, which you may read, but it is very hard to remember all those rules and apply them regularly, let alone convince all your teammates to do so!

Fortunately, two tools can handle most of the code styling for you, linters and formatters. In the python universe, the three most famous tools that cover these usecases are flake8, black and ruff. Briefly:

  • flake8 is a linter. It is a tool that checks your code for bits that don’t respect the PEP8 style guide, and gives you warnings. It is also designed to find simple bugs that can be catched simply by looking at the code: for example a return statement outside of a function, an if i == 1 test where i wasn’t defined before, etc. You can check its page for instructions on how to install it, and then using it is often as simple as running flake8 my_project_directory/ in a terminal to check all the scripts in a directory. It can also be installed as a VSCode extension to directly see the style warnings as you are editing your code.

  • black is a formatter. It is a tool that will automatically reformat your code so that it complies with the PEP8 rules. On top of that, black is opinionated so it adds a number of rules to make style even more consistent (described in the black code style). Yes, black will modify your code, but fear not, it will only do synonymous changes (meaning changes to the whitespaces or line breaks for instance, purely aesthetic changes that don’t impact the behaviour of your code). Again, you can check the official page for installation instructions and then run it simply as black my_project_directory/, and you can also install the VSCode official extension. This one is particularly handy when combined with a setting named editor.formatOnSave (find it in the settings, Cmd+, on Mac, Ctrl+, on Windows) that will automatically run the black formatter everytime you save a file. That way you don’t even need to worry about style anymore, this will be handled automatically! However, note that black can’t solve all your style issues: for example, it won’t change your variable names.

  • ruff finally, combines a linter and a formatter. It is also considered quite fast and got a lot of traction recently, so it would be my advised solution instead of the two above nowadays. You can get it here, and then run it in a certain directory by running ruff check if you just want to lint, or ruff check --fix if you want to lint and fix the issues automatically. Just like flake8, it will also catch numerous potential bugs. Again, not all issues are fixable, so ruff will give you at the end a report of all the issues it could not fix (like variable names that use the wrong casing). ruff will however fix more issues than black (notably the order of imports that otherwise needed to be handled with isort). Finally, ruff can also be installed as a VSCode extension (no need to combine it with flake8 and black in that case).

Using a tool like ruff will already make your code considerably cleaner, and the effects can be dramatic if all your teammates do so as well. However, it’s easy to forget to run it sometimes, or to miss one of the warnings. You will see how to get even more automated code styling in the section about pre-commit later in the chapter.

type hints

You may have noticed that python lets you a lot of freedom with your variables: lists can contain all sorts of objects together, functions don’t check what you pass them… This gives the programmer a lot of flexibility and sometimes the possibility to find smart and compact solutions for problems that would require a lot of boilerplate in other languages. But this property is more often a curse than a blessing, for two reasons:

  • it makes the code harder to read and understand, since you have to “guess” what is the type of each variable or function argument,

  • it makes certain errors hard to catch, because python will not complain until errors are irrecuperable. So if you have a function for manipulating arrays of numbers and you pass it a list, it may well not raise any errors and simply give you absurd results later on! This types of hard to find bugs are familiar to anyone with a bit of experience in python.

This doesn’t happen in most big languages like C++ or Java, as those are statically typed, meaning all variables have a determined type, and if you try to assign a string to a variable that expects an array of numbers, an error will be raised during compilation. This makes compilation a harder moment to pass, but it is also an important safety measure as it catches a lot of bugs before the code is even executed.

Fortunately, this possibility has now been brought to python, through type hints and static type checkers like mypy! In this paragraph we will talk about type hints.

Type hints are a way to add type information to your code, without changing the way python works. In a sense, you can think of them as “kind of” comments, since they will not be taken into account by the interpreter. They are only here to bring information to the people reading the code, and to the type checkers that will verify the validity of all types.

To add a type hint to a variable, simply add : and then the name of type after defining a variable. This can be done for function arguments, variables that appear for the first time, and for class variables:

def fn(a: int, b: float) -> list[float]:
    c: list[float] = [i * b for i in range(a)]
    return c

class MyClass:
    a: int = 0

    def __init__(self, a: int, b: float):
        self.a = a
        self.b = b

This type of code is already easier to follow. Collections can easily be typed by adding the type of the elements between square brackets (for example list[float], or tuple[float, int], or dict[str, int]). If in a certain place you want to allow for more flexibility, you can easily use the Any type from the typing module:

from typing import Any

dict_of_stuff: dict[str, Any] = {"a": 1, "b": "hello", "c": [1, 2, 3]}

A handy feature for function arguments are union types, that can be defined with the | argument. This is particularly useful for optional arguments:

def plot_stuff(x: list[float], y: list[float], title: str | None = None) -> None:

In this case, title can either be a string or None1. Things can get more fancy quite easily with nested types and unions: dict[(str | int), dict[str, (list[float] | str | None)]]. To avoid types that are again hard to parse for humans, I encourage you to keep your variables simple, and if a certain variable ends up having a complex type, use a type alias, for example:

KeyType = str | int
ValueType = dict[str, (list[float] | str | None)]
MyDictType = dict[KeyType, ValueType]

Finally, note that not every single variable in your code has to have a type hint to make for a readable codebase! Usually, the variables that are really important to annotate are function arguments and outputs, and some complicated collections, the rest can in most cases be easily deduced.

You may now read the sections about dataclasses or pydantic of chapter 3 that required this knowledge of type hints.

Use mypy

The types above already significantly enhance your code’s readability, and they can make it easier to find bugs when reading the code, but so far their practical effect is about the same as that of a comment: python never looks at your type annotation, and it never checks that the argument a: int you pass to a function really is an int. You will see in chapter 3 one way through which type hints are actually enforced, the pydantic framework, and here we will see another one, the mypy tool.

In strongly typed languages like C or Java, as we have seen, the compiler needs to go through all your code to convert it to a form of machine code, and in that process it needs to take into account the type of all variables, so if one variable doesn’t have the announced type, it will notice it and throw a compilation error before the program even has a chance at being run (except in rare occasions like when the variable is given through a user input, which is actually one of the main sources of security vulnerabilities, because types are that important). In python however, there is no compilation step, so we are left with two choices: 1/, we could make the python interpreter check types at runtime, but that would make everything a little slower and would catch the errors “too late”, after the program reached the faulty line when we could catch most of those only from looking at the code. Or 2/, we could add a “fake compilation” step that will only serve to check if types are correct. Frameworks like pydantic apply solution 1 for certain specific cases (usually targeted at external inputs), while mypy is a tool that applies solution 2, adding a static type analysis check that can be run before launching your code and will verify that all types are consistent.

Using mypy is extremely simple: first you need to have it installed, then you need to have code properly annotated with type hints, and finally you can run:

mypy script.py

to check types in one of your python files.

As explained in their documentation, you can introduce a bug, and see mypy warn you before you even got a chance to run your code:

def greeting(name: str) -> str:
    return 'Hello ' + name

greeting(3)  # This will trigger a mypy alert when you run mypy

We said before that not every single variable needs to be explicitly typed. In fact, many variables will have a type that can be naturally deduced from others, and mypy can perform these types of deductions. You can also check this for yourself:

def add_ones(l: list[float]) -> list[float]:
    res = []  # so far we can't guess much about res
    for x in l:  # here, mypy can guess x is a float
        res.append(x + 1)   # here, mypy guesses res can be a list of floats
    return res

add_ones([1, 2])

mypy can check the above has no problems. It will however raise an error if you modify add_ones as follows:

def add_ones(l: list[float]) -> list[float]:
    res = []  # so far we can't guess much about res
    for x in l:  # here, mypy can guess x is a float
        res.append(x + 1)   # here, mypy guesses res can be a list of floats
    res.append("string")  # mypy will detect this shouldn't be here, as res is a list of floats.
    return res

In summary, you can trust mypy to do a lot of the heavy-lifting and catch many bugs if you only provide it with the most essential type annotations, notably function inputs and outputs.

Use pre-commit to automate checks

Ruff and mypy are fantastic for getting a clean codebase, but they have a big issue: one has to remember to run them! And even if you can configure VSCode to plot warnings in your code from those tools, you could well be distracted and push code that isn’t properly checked, or you might have a less careful collaborator that does. A first step to automate the static analysis of your code is to use pre-commit hooks.

Basically, git can call hooks whenever a command is run, for example each time git commit is called, and this is used by a tool called pre-commit to allow you to automatically run some checks every time your call git commit and cancel the commit if those checks do not pass. To use it, you can install it in your environment with pip install pre-commit, and then you will need to create a file in your project’s directory called .pre-commit-config.yaml (the dot at the beginning is important), which will contain the set of hooks to call as well as their configuration. Here’s a good example:

repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v5.0.0  # Change to whatever is the latest version
    hooks:
      - id: check-added-large-files  # Prevent huge data files from being pushed
      - id: check-json  # linter for json files
      - id: check-merge-conflict  # checks no unresolved merge conflicts
      - id: check-yaml  # linter for yaml files
      - id: debug-statements  # checks no explicit debugger breakpoints
      - id: end-of-file-fixer  # Adds newline to every file
      - id: mixed-line-ending  # related to some line endings on Windows
      - id: trailing-whitespace  # Trims trailing whitespaces

  - repo: https://github.com/charliermarsh/ruff-pre-commit
    rev: v0.3.4  # Change to whatever is the latest version
    hooks:
      - id: ruff
        args: ["--fix"]  # Allow auto-fixing

  - repo: https://github.com/pre-commit/mirrors-mypy
    rev: v1.8.0  # Change to whatever is the latest version
    hooks:
      - id: mypy
        args: ["--strict"]  # Report situations where type can't be inferred

The above configuration will call three sets of tools before every commit: the official pre-commit-hooks repository contains a bunch of very useful checks, I have added above a few standard ones but you can check their repo to see which ones you find useful. To add a new check, simply add it on a line starting with - id: in the hooks list of that section of the file. The two other tools are ruff and mypy, which we have already seen. They will also be called automatically and block the commit if they report an error, so you can’t miss them anymore! In this particular case, ruff is in --fix mode so it will fix whatever it can (but remember it can’t always fix by itself all the errors it sees), and mypy is in --strict mode, meaning that it will not allow untyped functions and variables whose type it can’t infer unless a # type: ignore comment is added on the corresponding lines.

To finish setting this up, you will need to run pre-commit install once in your project’s directory so the hooks get attached to git, and they will then run automatically every time git commit is run. However, they will only run on files that have been git added. You can also run your checks without doing a commit at the same time with the command pre-commit run (add --all to run on all files, including those that are not staged). Finally, if you ever need to ignore the checks for exceptional reasons (for example you need to quickly backup some work in progress) you can still run git commit --no-verify but that should obviously be extremely rare.

Run automated tests with pytest

Not all errors can be catched by linters and type checkers, in fact only a small number of basic mistakes can. These automated tools can’t guess how you want the logic of your code to behave, so if you want to avoid errors there you will have to resort to another well-known tool: the unit test. A unit test is a very simple piece of code that will call one of your function with some chosen arguments, and check that the output is what you expect. It’s basically what you would do anyway on a terminal to check that your function works, but it’s automated and can be run every time you make a change to your code.

In python there was a framework called unittest but pytest seems to be the most popular, and the one I would recommend. Here’s a quick example, let’s write a function that finds if a integer is prime or not. Let’s write a first version in a file called primes.py:

from math import sqrt

def is_prime(n: int) -> bool:
    if n < 2:
        return False
    for i in range(2, int(sqrt(n))):
        if n % i == 0:
            return False
    return True

Now to test this function with pytest we need to write a test function in a different file. Your function must be in a file that starts with test_ and its name should also start with test_. For example you can create a file called test_primes.py with the following content:

from primes import is_prime

def test_is_prime() -> None:
    assert is_prime(2) == True
    assert is_prime(3) == True
    assert is_prime(4) == False

(in a bigger project we would generally put all tests in a tests/ directory).

Then, after you have installed pytest of course (pip install pytest), you can simply run in your projects directory the command:

pytest -v

(-v is just for verbose output). This will automatically run all the functions starting by test_ in all the files starting by test_ and check that the assertions are true. If one of them is false, the test will fail and you will see a report about it. You can see that the above test fails, and you can see where it fails exactly! It’s at the line assert is_prime(4) == False, because indeed there is a little mistake in the is_prime function. Can you find it? 😉 I will let the reader fix this function and run the test again to see it pass.

The great thing about pytest is that you can use it immediately without much further knowledge, since it can simply exploit python assert statements. It has a few features that can allow you to write tests a bit more easily, one of them is the parametrize decorator. An equivalent way to write the above test with it would be:

import pytest
from primes import is_prime

@pytest.mark.parametrize(
    "n, expected", 
    [(2, True), (3, True), (4, False)]
)
def test_is_prime(n: int, expected: bool) -> None:
    assert is_prime(n) == expected

Basically, in the above code we have added two arguments to the test function, and then in the decorator we sent a list of tuples that will be successively used as arguments to the function. The first argument to the decorator specifies which function arguments should be replaced by the tuples. This can make it easier to write a test that tries many input-output pairs.

Additional interesting features of pytest includes its command-line arguments, notably:

pytest -k "expression"  # Run only tests whose function name match the expression
pytest --trace  # Run with the pdb debugger attached, and breaks at the beginning of every test
pytest --pdb   # Start pdb at the first failed test
pytest -s  # Print the standard output as tests are running and not at the end

Additionally tests can be marked with the decorator pytest.mark.mark_name provided you use a mark_name that is not a built-in one (like parametrize is). This can allow you to skip certain tests if you want with the -m command line flag. For example if one of your tests is really slow, you can decorate it with @pytest.mark.slow and then run pytest -m "not slow" to run all tests except those having this mark. An even finer control can be achieved with the @pytest.mark.skipif decorator that will skip a test if a certain condition is verified (for example if on a certain OS or python version, or if there is no GPU)!

Write test fixtures

One difficulty of tests is that they run so much in isolation, each test function being an isolated component sometimes leading to code repetitions. For example, if you have a few functions that act on data, you will need to load sample data or create some fake data in each unit test to see if those functions work. Test fixtures are functions that set up some resources before a test, and clean them up after the test is done, but they have a peculiar syntax: each fixture is a function decorated with @pytest.fixture, and they are invoked as arguments of test functions where you want to use them. Concretely, for the example above, you could have a fixture sample_data that gives you some fake data, working like this:

@pytest.fixture
def sample_data():
    return [1, 2, 3, 4]

def test_mean(sample_data):
    assert mean(sample_data) == 2.5

def test_max(sample_data):
    assert max(sample_data) == 4

It’s a little cumbersome, but if you run pytest on a file like this it will work perfectly! Basically, when getting into a test function, pytest looks at the names of its arguments and checks if it hasn’t previously registered a fixture with the same name. If it has, it simply calls the fixture and passes the result as this argument.

There are many interesting aspects with fixtures. One thing to know is you can group them in a file called conftest.py in the same directory as your test file, and pytest will import it automatically. Another thing to know is fixtures can also cleanup resources by using the yield keyword instead of return, as in:

@pytest.fixture
def resource():
    # Setup
    resource = create_resource()
    yield resource
    # Teardown
    resource.cleanup()

The lines after the yield statement will here be executed after the test function has finished.

You might also be interested in the default fixtures offered by pytest, notably capsys to capture stdout and stderr if you want to check outputs or tmp_path to have a temporary directory that will be destroyed after the end of the test, very useful for functions that interact with files.

❯❯❯ Level 3

What is CI/CD?

CI/CD (Continuous Integration/Continuous Deployment) is a relatively recent concept in software engineering that vastly expanded the productivity of engineers, and one way to think about it is as a way to automate most of what isn’t writing code itself. Basically, things like running tests, checking compatibility with existing codebase, building the code, releasing new versions, deploying new versions on servers, all these tasks are targets for automation in CI/CD. This was initially made possible by special tools like Travis or Jenkins, but since the end of the 2010s, the big code hosting platformes like Github or Gitlab have started to offer their own CI/CD tools, so if you are using Github, I can strongly recommend to directly use the free feature Github Actions.

Before diving into the details, let’s look a bit at the programming workflow under CI/CD: - you have an idea, and create a new branch in your repository to work on it, - after a few commits, you think your code is ready to be merged into the main branch, so you create a pull request, - a CI/CD workflow that you have written in advance is triggered at that moment, and runs the linters, tests, etc. to ensure your code works and doesn’t break previous functionality. These tests run on Github/Gitlab’s virtual machines, but can be parametrized to simulate diverse environments (different OS, python versions, etc.) which is very convenient, - at this point you merge into the main branch, and tests run again to ensure the merge happened seamlessly, - if your code is versioned, another workflow can be triggered to build a new version of your software, as binaries, a pip package, or a docker image for instance, - a version number can be assigned to this new version, - and if you have servers, another workflow can be triggered to deploy this new version on your servers.

Very powerful pipelines for CI/CD exist in the industry, to scan for security vulnerabilities, to do gradual releases to users, test the reaction of users, etc.

The big advantage of CI/CD is, like for most tools in this chapter, to give you peace of mind once it is setup. Having linters and tests for example is only useful if everyone remembers to run them, but with CI/CD you don’t even need to worry about that. It’s like having a very meticulous colleague looking after you at all times, it can be reassuring, and allows to focus on other things!

Run CI/CD pipelines with GitHub Actions

We can stick here to a simple workflow to run linters and pytest on a python project. We will assume that you have a project with pre-commit already setup, as well as some tests ready for pytest. Github’s CI/CD is invoked by placing a .github/workflows directory at the root of your project, and writing yaml files in it that describe the steps of the workflow that you want to run. Using yaml can seem peculiar but it’s essentially a way to do declarative programming on specific settings, and it’s great to specify steps of a particular software that one wants to run. Here is a workflow that we could write, let’s call it lint-and-test.yaml:

name: CI

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]


permissions:
  contents: read

jobs:
  lint-and-test:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout repository
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: "3.10"

      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          python -m pip install .
          pre-commit install

      - name: Run pre-commit hooks
        run: pre-commit run --all-files

      - name: Run tests with pytest
        run: pytest

You can see that it starts by some parameters, notably the on field describes the triggers: the workflow will be run by pushes and PRs on the main branch only. The rest is a list of jobs (here there is only one job, lint-and-test), each composed of a running environment (ubuntu-latest, the OS where this will be run) and a list of steps. The first 2 steps are standard, first checkout (like cloning) the current repository, then installing python. The third step is more interesting, installing your project with its dependencies. Finally the last two steps are the main course, where pre-commit and pytest are run. You will see in the “Actions” tab of your repository, or on the corresponding PR, the results of the workflow as they are run, and you can explore its logs if any error is reported. Again, it seems quite complicated like this, and Github Actions in general have a lot of features to learn about, but starting with a simple workflow like this one can already do wonders for code quality, and you can slowly experiment with it afterwards.

Profiling with cProfile

🏗️​ Work in progress

Memory profiling

🏗️​ Work in progress


  1. the older syntax Optional[T] for T a type is equivalent to T | None and remains in use in many codebases.↩︎