David J.
David J.

Reputation: 32715

Experimental research findings on white space (for language design and style guides)?

What does experimental research say about white space in code? Let me be specific: I'm talking about cognitive studies that compare how quickly and effectively people can read and grasp visual information that comes across in different formats.

Let's say you were designing a new computer language and had to make some decisions that affect how the source code looks. Or you were simply writing a style guide for a new language and wanted to make recommendations. Relevant topics might be identifier style (snake_cased_identifiers vs. camelCaseIdentifiers / PascalCaseIdentifiers), horizontal indentation, documentation styles, or vertical spacing.

I'm intentionally asking this question in this way to avoid recommendations such as:

I don't want a flame war between people that support differing approaches; rather, I'd like to know what experimental research has to say about the matter. (And I don't expect any particular study to necessarily be completely 'objective' or 'neutral'.)

To give a 'squishier' motivation for this question: People appreciate white space in code, when reading documents, and in art (such as listening to music). These fields all put a big emphasis on the importance of space.

So, thanks, I'd appreciate to hear what the studies have to say. To be clear, I'm not ruling out the importance of style and art -- I actually would hope that the wisdom from these worlds will show up in experimental studies.

In summary, if you can, please touch on one or more of the following:

Upvotes: 16

Views: 802

Answers (2)

ThinkingStiff
ThinkingStiff

Reputation: 65351

This is a highly subjective topic, but you can take some cues from 1000 years of typographic history.

Research has been done on whitespace in typography, but less so on code. But you can assume many of the basic findings in legibility and comprehension apply to code as well. This study, Reading Online Text: A Comparison of Four White Space Layouts, shows that proper vertical spacing with large margins increases comprehension, but slows speed. For code, it could be safely assumed that comprehension is more important than speed. So you could objectively say, more space is better for code. But when you get into specifics of tab size and brace positioning it gets highly subjective. In code, margins are indentation, paragraphs are functions and code blocks, and periods are line breaks, braces, parens, etc.

When you start asking programmers what format of whitespace is more readable, though, answers are all across the spectrum. The best you can do is look for generalities that seem to be universal.

Such as:

  • put whitespace before and after blocks of code, like functions and classes
  • put whitespace before and after logical groupings of code within blocks
  • long blocks of code with no vertical whitespace are more difficult to read
  • long blocks of code with no indentation are more difficult to read
  • long lines of code with no horizontal whitespace are more difficult to read

I think most programmers will agree with those statements.

Example (pseudo-code):

thisismore(difficult<toread)because,itsall-smushed{together}
this-is-less ( difficult < to-read ) because, it's-not-all - smushed { together }

To touch on your last four points:

Variable Naming:

This is as subjective as whitespace, but again you can look to typography for clues. Typography has serif fonts, capital letters starting sentences, periods ending sentences, and a space after periods. All of these things are to allow your eyes to transition between words and sentences. With variables, clarity is important, so they are often multi-word names. For your eyes to easily read them, something needs to alert them that a new word has started.

This is harder to read (for most people):

  • mylongvariablename

Than these:

  • my-long-variable-name
  • myLongVariableName
  • my_long_variable_name
  • MY_LONG_VARIABLE_NAME

Now which of those is best or most readable is subjective, and may always be. But what's important is that something separate the words.

Horizontal Indentation:

Code that is not indented at all is more difficult to read than code that is. Too small an indentation and your eyes have trouble distinguishing blocks. Too large and you waste space and add no more clarity. Somewhere between four and eight seems to be right based on the eleventy-bazillion lines of code written using those sizes.

Horizontal Alignment:

Again, typography to the rescue. Lists of things aligned in columns are easier to read. For list item data that are longer than one or two words or numbers (like sentences), bulleted lists are used. For textual data, left-aligned columns are used. For numeric data, right-aligned columns are used. You can apply these principals to code. Bulleted lists can be seen as code-blocks, all aligned to the same indentation level. Variables are textual data, so left alignment would be easiest to read. If the values you were assigning were numeric, right alignment would be best.

This is more difficult to read (for most people):

var oneVariable = 10023, a = 370,
p = 4,
answerToLife = 42,
openThePodBayDoorHal = false;

Than this:

var oneVariable = 10023, 
    a = 370,
    p = 4,
    answerToLife = 42,
    openThePodBayDoorHal = false;

And this is probably easiest:

var oneVariable          = 10023, 
    a                    =   370,
    p                    =     4,
    answerToLife         =    42,
    openThePodBayDoorHal = false;

Vertical Spacing:

Imagine this whole post with no spacing between paragraphs. Almost everyone can agree that would be harder to read and understand. Now, many could argue that more space between sections would be even better. In typography, sections are delineated with headers that have a larger font size and more vertical space (like you see in HTML with H1, etc). In code, we have one font size, so we have to work with whitespace and whatever bracing concept the language uses (if any). Like horizontal spacing, more is better than less. Specifics about exactly what that means is subjective, but with most languages, programmers settle into a convention for that language that most people use. If you are defining your own language (or coding standard), then you can set the convention.

The most important thing, not matter what the standard is, is that it is used consistently throughout all your code. This is way more important than the specifics of the standard. Consistently formatted code is much easier no matter what the standard.

Search for typography readability studies for more information.

Upvotes: 7

David J.
David J.

Reputation: 32715

There is an annual IEEE conference titled the International Conference on Program Comprehension (ICPC) which often has experimental studies on program comprehension. The most relevant that I found from the past three years are:

  • An Eye Tracking Study on camelCase and under_score Identifier Styles "While results indicate no difference in accuracy between the two styles, subjects recognize identifiers in the underscore style more quickly."

  • To camelcase or under_score "Results indicate that camel casing leads to higher accuracy among all subjects regardless of training, and those trained in camel casing are able to recognize identifiers in the camel case style faster than identifiers in the underscore style."

In addition to the computer-science specific cognitive literature, there are studies about online and hypertext reading:

  • [Chaparro, 2005] Reading Online Text with a Poor Layout: Is Performance Worse? by Barbara S. Chaparro, A. Dawn Shaikh, & J. Ryan Baker, Usability News, Volume 7, Issue 1, February 2005.

  • [Lin, 2004] Evaluating older adults' retention in hypertext perusal: impacts of presentation media as a function of text topology by Dyi-Yih Michael Lin in "Computers in Human Behavior", Volume 20, Issue 4, July 2004, Pages 491-503. Available from ScienceDirect

  • Cognitive load in hypertext reading: A review by Diana DeStefano and Jo-Anne LeFevre.

These papers less directly address the question, but I mention them in hopes that they provide some context. The first two references were found thanks Michael Suodenjoki's blog post entitled White space matters in program source code.

Upvotes: 9

Related Questions