Good Comment, Bad Comment

How the meme of "clean code has no comments" became a cancer in software engineering

Mar 09, 2024

In my job as hiring manager for my startup Vom, I have graded countless take-home assignments from early in career Software Engineers. Though they varied greatly in most dimensions, they were similar in one regard. Despite being thousands of lines long, they had almost no comments1.

What can explain this puzzling observation? How did it affect code quality? And why does it matter? Today on David Reis on Software.

Nobody Comments their Code

I have read and reflected on code for a living for more than 20 years, and the code in the responses to my take-home assignment was, let’s say, not impressive. I was amazed by a few responses overall: their breadth, craft and understanding of the problem, but even the great ones did not contain Good Code™.

In some sense, this is not surprising. The candidates had an average of a couple of years of experience, and writing good code is just too damn hard. The fact they were writing zero comments though was a major self-own. Without that, their code could even be average.

This puzzling observation requires explanation, and I hope that by providing it, you, dear reader, can improve your own code and learn a bit about memetics. Maybe you will even start writing comments, inshallah!

Y U NO COMMENT

When a candidate delivers a reasonable answer to my assignment I schedule a pair programming session with them. One question I always ask is “Why didn’t you write any comments?”. The answer invariably is two words: “Clean Code”. That is, they were following the supposed advice of programming best seller book “Clean Code” by Bob Martin (AKA Uncle Bob), which they summarize as “good code has no comments”2.

Clean Code: A Handbook of Agile Software Craftsmanship | Amazon.com.br

But does the book really say that?

No, “Clean Code” obviously does not say good code has no comments. It says instead that code with no comments that is just as readable, is superior to the commented version.

That is, since code is meant for humans to read, it’s readability that is the most important imperative. My clearest statement of this principle is:

Code Should be Obvious

Was the candidates’ code obvious? No. Let me give a single example3. This is a node in a graph written in Python:

T = TypeVar("T")

class NodeTypes(str, Enum):
    START = "start"
    IF = "if"
    SET = "set"

class Node(BaseModel, Generic[T]):
    model_config = ConfigDict(extra="allow")

    id: str
    data: T
    type: NodeTypes | str | None

There are several mysteries in this short snippet:

Why did they use a generic parameter? Why not Any, or a concrete type?
What concept does NodeType represent? And how are the members instances of that concept?
Why is extra set to "allow"?
Why can type be a string in addition to the enum value? Why can it be None?
What are the constraints on id, and how it’s defined?

Eight doubts in ten lines of code. This is indeed not Good Code™, and I assume Uncle Bob would not find it clean either.

Good Code™: The Elusive Eldorado

Ideally good code should be self-evident without the need for comments. Unfortunately, that platonic ideal can only ever be approached, and to the extent it’s not reached, we must use comments to fill the gap. To give an idea of what is a reasonable number of comments I counted the comment lines in my current Python codebase, including docstrings, and got 6%, or about one comment line for every seventeen lines of code.4

Here are some principles I follow when using comments to make my code obvious.

I always comment data model classes and every field in them. Even code that might appear obvious is full of mystery when you really think about it. For example, consider this code:

@dataclass
class User:
    id: str
    groups: list[str] | None

Seems clear enough, but compare with:

@dataclass
class User:
  """
  Users as known by the Wingman system. Notice Wingman does own or
  control users, as that is Kilmer’s role (see
  http://go/kilmer-user-management). You can think of this entity
  as a small Wingman side table to Kilmer’s users.  
  """

  id: str
  """
  Twelve character nanoid (http://github.com/ai/nanoid). The length
  was chosen such that the chance of collisions is <1% in over 100
  years at our normal QPS of 2. See
  https://zelark.github.io/nano-id-cc/.
  We chose nanoid over UUID to keep the id short and URL-Safe.
  """

  groups: list[str] | None
  """
  Normalized names of the groups the user belongs to. Notice that
  groups don’t show up in this field instantaneously when the user
  is added to a group, but only after Galactus replicator has run
  (see go/galactus-design#replication). None means the user does not
  participate in the communities feature.
  """

See how many mysteries were lurking? All model code with no comments code is like that.

Imagine you were a newbie in the team that owns the code. Which of these two versions would you prefer? What version would make you ramp-up faster? Now imagine that your company has considerable turnover. Which version would help avoid having to thrown away and rewrite the code, since no one can maintain it?5

And that brings me to the second principle: I comment on anything that would be necessary for someone on their first day to understand the code. For example, instead of:

CHART_HEIGHT = 350

I write:

CHART_HEIGHT = 350
"""
Chosen to fit three charts on an average phone screen as of 2023.
Ideally, we’d like to fit four, but in phones with normal DPI that
makes it hard for users with low vision to read legend text.
"""

This example shows that it’s usually the why that is missing on comment-less code. To add it, self-review the code and try to think like a newbie. What would they ask? Then just answer those questions in the code.

How to Remove Comments from Code

Like Uncle Bob preaches, code can become better by removing comments. You just need to know how to do it.

Omit redundant comments

This code is just as informative, without the comment:

@dataclass
class User:
  ...
  createdAt: datetime
  """Date the user was created."""

Make names more semantic

Names can be whatever you want, so make them as rich as possible, embedding rationales and context that make it easy to understand why the code is doing what it’s doing. For example, instead of:

users = self._get_users()
user = [user in users if user.date is None]

Write:

users = self._fetch_users_from_wingman()
users_to_be_synced = [user in users if user.date is None]

The Anatomy of a Meme

I’m sure Uncle Bob did not intend for all early in career engineers to write zero comments, so how did he have that effect? Memetics6 can explain it.

An idea (i.e. meme) does not spread because it’s true, but because it causes someone to transmit it. Like genes, that transmission includes mutation, and mutated versions that are more likely to spread will win over more faithful but less viral versions.

Here’s how I think the “write no comments” meme spread:

Uncle Bob wrote about a true, but subtle idea: “don’t write redundant comments and rewrite code to be just as readable with less comments”.
Though wise, that is a complex idea that a lot of people did not understand fully. Some people understood it as “don’t write comments” and spread it in that mutated form.
Writing good comments is hard. It takes time and requires the burdensome work on taking the perspective of newbies. Since people are lazy7, they would prefer to avoid all that work, if possible. How convenient then, that someone as prestigious as Uncle Bob advises to “write no comments”!

If I’m right, I take two lessons from this story.

First, you should look at the original richer more complex version of a bastardized idea. Though less catchy it’s likely to be more truthful.

Second, you should be suspicious of ideas that are too convenient. Ideas that, if true, mean for example that you don’t need to work ao hard8. Notice that I’m not saying such ideas are false, only that since they spread more easily, you should apply a more rigorous burden of proof to them.

It’s also interesting to think about what responsibility Uncle Bob has for releasing such a toxic meme into the wild. Certainly, he can’t be fully responsible as he never said: “write no comments”. On the other hand, he chose the framing of the idea and had a key role in spreading it, so he bears some responsibility. For example, he wrote9:

"Every time you write a comment, you should grimace and feel the failure of your ability of expression"

That is true, but you can see how such strong language can help a lazy person (i.e. all of us) self-justify writing no comments. J’accuse uncle Bob10 !

In Conclusion

Above all, make your code obvious. If needed, use comments for that, but try to learn how to need fewer comments over time. It will take decades.

When evaluating ideas, be especially suspicious of viral memes that reproduce by preying on human vices, like laziness or envy. At the same time, frame your ideas in resilient forms that can spread without losing their core of truth. Don’t be like Uncle Bob.

The most common comment count was zero. Way behind, the second most common was one, and I remember no case that used more than five.

I'm paraphrasing. When people have an idea like that, they will not verbalize it in its most extreme version since it becomes obviously wrong. By talking more and watching their actions you can tell what they really think.

This comes verbatim from a response to my take-home assignment written by a Senior Engineer with 6 years of experience. Also consider that it’s reasonable to expect people will try writing the absolute best code they can for a hiring test.

About two thirds of those are class, field, or method comments. Good comments on that kind of interface-like code go a long way to make a codebase understandable. Unfortunately, it’s also the hardest code to write good comments for. He who has never commented userId with “id of the user” can throw the first stone.

Notice that such mysterious code is only maintainable at all because some companies do an okay job of transmitting knowledge from one person to the next, via pair programming, code review, etc. While this can work fine in some contexts, the near future will bring a new disadvantage to that way of working. All that knowledge is absent from the code and that makes it invisible to Coding Agents, like Copilot and ChatGPT. Work on properly documented codebases will get a big productivity boost from these agents, while poorly documented codebases will lag behind.

Memetics is the study of how ideas spread and evolve like in the biological transmission and evolution of genes. It treats information as replicable units called memes. See https://en.wikipedia.org/wiki/Memetics.

Lazy in the technical sense of preferring automatic low effort forms of reasoning, to careful logical ones. The definitive work on that distinction is “Thinking Fast and Slow” by Daniel Kahneman.

For example: it’s easy to get rich with the latest memecoin, or ice-cream is actually good for you.

See https://www.goodreads.com/work/quotes/18312943-the-robert-c-martin-clean-code-collection.

See https://en.wiktionary.org/wiki/j%27accuse.

David Reis on Software