9 reasons your technical (Python) documentation sucks

  • June 14, 2022

This article was originally presented as a “brown bag talk”, an internal series of talks where Praelexis employees meet over lunch to share and discuss technical topics that interest them.

In today’s post we’re going to (briefly) explore one of my favourite topics: technical documentation. We’re going to look at some reasons why your technical documentation sucks, and, along the way, touch on how we can maybe make things better. The post is a touch hyperbolic and tongue-in-cheek, but I can assure you that the lessons contained within are not — they are extremely valuable and useful to teams of one, as well as teams of one thousand.

Also, while non-technical folks can learn something here, this post is squarely aimed at the technical / developer crowd, and sometimes specifically to Python.

Without further delay, here’s why your documentation sucks.

1. You think you don’t need documentation

Documentation is like [pizza]. Even when it’s bad, it’s better than nothing

Someone on the internet, probably.

You 100% absolutely need to document your code. No matter how good your product, library, or tool is, if your documentation sucks, people aren’t going to use your tool. Period. It’s that simple. If you force people to use your tool without good documentation, they won’t just be ineffective — they’ll also dislike you. Worst of all, if you’re forced to work on your own codebase without excellent documentation, you’ll begin disliking yourself.

Not having documentation is like trying to wander around in a dark cave without a torch — it’s torturous (ha!). All you do is get lost and confused. It leads to frustration and misunderstanding how things actually work, particularly if you encounter a project for the first time (which is the same as coming back to a project a few months later).

If you think you don’t need documentation, then your documentation sucks. And not having documentation also means your documentation sucks.

2. You think documentation only refers to docstrings and comments

This is a common mistake I see beginner programmers make. But what about the following:

  • Good variable names (especially if you work in a dynamically-typed language!)
  • Clean code
  • Good tests
  • Good examples (technically often considered documentation, but still!)

Anything that helps people understand the behaviour and intention of your code is documentation!

An example of descriptive variable names:

 

And some good unit tests that document how things are supposed to behave:

(and please, remember to write tests!).

Use everything at your disposal to make your code easy to understand. Documentation extends quite far beyond docstrings and comments. Don’t ignore variable names, clean code and good tests. Otherwise your documentation will suck.

3. You don’t know how and when to use comments

Another common beginner mistake (this is often how you’re taught at university, so it might not be entirely your fault). Your team asks you to document your code. And you produce something like this:

Now, I know your intentions are good when you write code like this. But I can read code. And guess what, your teammates can read code too.

Instead, rather use code to explain something that is surprising or unexpected, or is deliberate due to something non-obvious. For example, take a look at this snippet from one of our Django codebases:

 

If you’re not familiar with Django, you might not know that you can build up a single query by using the | operator on multiple Q objects, instead of step-by-step narrowing down your selection. It’s a neat little trick, so the developer added a comment (and a reference to a StackOverflow question!) to explain what the intention is, and why it’s there.

Take another example:

When modifying with a session in Django, it will not be saved unless it is assigned to a variable first, due to some technical reason. An unsuspecting developer might come along, and think they can modify the session directly via self.client.session, but then later be confused why the session isn’t saved. Again — use comments to explain things that are surprising!

And you don’t have to take it from me either:

A delicate matter, requiring taste and judgement. I tend to err on the side of eliminating comments, for several reasons. First, if the code is clear, and uses good type names and variable names, it should explain itself. Second, comments aren’t checked by the compiler, so there is no guarantee they’re right, especially after the code is modified. A misleading comment can be very confusing. Third, the issue of typography: comments clutter code.

Rob Pike, “Notes on Programming in C”

Use comments wisely, otherwise your documentation will suck.

another_filing_cabinet-1024x1024.jpg

4. You think documentation is a substitute for confusing code

This is almost the opposite case of Rule 2: your code works, but it’s poorly written and confusing. So you think to yourself, “Ah hah! Rather than fix this, I’ll just add some documentation to explain what’s happening. That should be ok.”

And you’d be wrong.

Take the following implementation of Fizz Buzz:

From the comments, I understand what the code is supposed to do, but I have no clue how to actually works.

  • What happens if I need to modify its behaviour?
  • What happens if I forget how it works (you will eventually)?
  • What happens if it has a bug? How do I figure out what’s gone wrong?

As a side note, it’s also highly probable that if you can’t write clean code, you likely won’t write good documentation anyway:

A common fallacy is to assume authors of incomprehensible code will somehow be able to express themselves lucidly and clearly in comments.

Kevlin Henney

So focus on your fundamentals! If you can’t write clean code, your code sucks.

But your documentation also probably sucks.

5. You don’t know about PEP-257

Our first Python-specific point 🐍.

A lot of Python developers know about PEP-8, which is the official style guide for Python code. What not a lot of Python developers know is that there is also a documentation equivalent: PEP-257.

It’s worth reading through PEP-257 (it’s not long!), but my favourite example is one I see even senior developers not do: use the imperative mood in the first line of a docstring:

The docstring is a phrase ending in a period. It prescribes the function or method’s effect as a command (“Do this”, “Return that”), not as a description; e.g. don’t write “Returns the pathname …”.

Extract from PEP-257

Follow PEP-257, otherwise your documentation sucks.

6. You don’t choose and follow a common docstring format

Luckily for us, the Python ecosystem is so large and mature that a number of really smart people have already spent a lot of time thinking about things like documentation. To date, there are three major docstring formats in Python that most projects will use:

  • reStructuredText
  • Numpydoc (my personal favourite)
  • Google Python Style Guide

Each style guide has a clear specification that you can (and should) follow.

Just because these docstring style guides doesn’t necessarily mean they’re good, but by standardising on accepted formats nets you a few easy wins:

  • Documentation generators (eg. Sphinx) will typically support the one of the major formats.
  • Teammates will typically be familiar with one or more of the formats already
  • It’s easy to find examples online.

Let’s take a look at what these three major formats look like:

It doesn’t matter too much which format you choose, as long as you choose one and stick to it!

7. You choose the wrong docstring format

Except maybe it does matter which docstring format you choose?

I personally (fight me!) think that reStructuredText is the wrong format. Here’s why:

  • It’s rather “ugly” in the source code.
  • The specification is confusing (years later, I still need to look up how certain directives work)
  • It’s hard to find good Python examples online
  • It seems quite fragile to parse (this might be the side-effects of the confusing spec)

Fun fact: We chose standardised on reStructuredText at Praelexis. This was at a time before NumpyDoc or Google Python Style Guide were as popular as they are now. We’re in the process of considering moving to a more readable format, now that better things are available.

yet_another_filing_cabinet-1024x683.jpg

8. You do the bare minimum

Consider the following function (pulled again from one of our codebases):

And now compare that with the following:

Which would you rather have? I think it’s fairly obvious. More explanation, more reasoning, more exposed “thinking process” is almost universally preferred. You’ll see this echoed across all of the major Python projects: scikit-learn, numpy, pandas, etc. all often have more documentation than code. This isn’t an accident!

Some other points to keep in mind when going beyond the bare minimum:

  • Poor grammar and incoherent writing is a sign of a poor thinking process and lack of understanding.
  • Good documentation takes time, and requires effort. Do it anyway.
  • Deep expertise is not a prerequisite for good documentation.

Also do more than the bare minimum. You’ll be thanked by your teammates, users, and yourself.

If you do the bare minimum, your documentation sucks.

9. You’re unaware of documentation’s biggest flaw

Even if you assume the perfect build system: excellent unit tests, linting, style checks, and so on. You’ll still have the following issues with documentation:

  • Incorrect or bad documentation doesn’t cause the build to fail.
  • Out-of-date documentation doesn’t cause the build to fail.
  • You will forget to update the documentation once you’ve changed the code.

When it comes to docstrings and comments, documentation is code. But it’s code that doesn’t get executed or tested (yes, I know about doctests, that’ll help with testing examples, but not with prose). As a result, you have to be extra vigilant and disciplined when it comes to maintaining your documentation. Made a point of revisiting it, making sure it’s still up to date and relevant. Otherwise you might inadvertently do the worst thing documentation can do (besides not existing): pointing someone in the wrong direction.

 

Related Articles

The importance of product development skills for data science product innovation

May 13, 2021
Praelexis runs an exciting internship program. Twice a year (June/July and Dec/Jan) we are joined by bright young minds...

The 6 Steps to Automate Data Collection using Web-Scraping  

August 11, 2022
Praelexis runs an exciting internship program. Twice a year (June/July and Dec/Jan) we are joined by bright young minds...

Democratising data analysis: The Power of Self-Service Analytics

January 12, 2024
Self-service analytics is a form of BI (Business Intelligence) that enables you to answer your data-related queries and...

Subscribe to our blog