Spellbinder2050
Spellbinder2050

Reputation: 652

K&R answer book exercise 1.21

Here is the question:

Write the program entab that replaces strings of blanks by the minimum number of tabs and blanks to achieve the same spacing. Use the same tab stops as for detab. When either a tab or a single blank would suffice to reach a tab stop, which should be given preference?

I did the exercise myself, and the book's answer has a different solution. I'm not understanding a mathematical formula that executes when a tab is read from the input stream. Here is the code:

#include <stdio.h>
#define TABINC 8

main()
{
int c, nb, nt, pos;

nb = 0;
nt = 0;
for (pos = 1; (c =getchar()) != EOF; ++pos)
    if (c == ' '){
        if (pos % TABINC != 0)
            ++nb;
        else {
            nb = 0;
            ++nt;
        }
    }
    else {
        for ( ; nt > 0; --nt)
            putchar('\t');
        if (c == '\t')
            nb = 0;
        else
            for ( ; nb > 0; --nb)
                putchar(' ');
        putchar(c);
        if ( c == '\n')
            pos = 0;
        else if (c == '\t')
            pos = pos + (TABINC - (pos - 1) % TABINC) - 1;
    }
}

The part I'm not understanding is the following:

else if (c == '\t')
    pos = pos + (TABINC - (pos - 1) % TABINC) - 1;

I can see through debugging in visual studio that this construct brings the pos to the next tab stop when a tab occurs from the input stream. Is this correct?

What I really don't understand is how this formula works or how they came up with it. is this a common formula that occurs in programming? Is it useful? Does it have a name?

Edit: I do understand what the modulo operator does. I'm sorry I didn't specify that.

Upvotes: 1

Views: 1058

Answers (2)

deviantfan
deviantfan

Reputation: 11434

It does not have a name or something like that.
In detail:

First of all, how much pos has to be increased depends on pos%TABINC,
ie. TABINC is 8, so if pos is a multiple from 8, add 8,
if pos%8 is 1 (like 9, 17...) then add 7,
if pos%8 is 2 (10, 18...) add 6 and so on.

Full list:
pos%8 -> number to add
0 -> 8
1 -> 7
2 -> 6
3 -> 5
4 -> 4
5 -> 3
6 -> 2
7 -> 1

That would be 8 - pos%8 or, more general TABINC - pos%TABINC

Important: A negative number modulo something in C is mathematically not correct
In C, for a,b >= 0: (-a)%b == -(a%b)

What is added in the code is (TABINC - (pos - 1) % TABINC) - 1
With some basic math and the fact above, this is
(TABINC - (pos - 1) % TABINC) - 1
= TABINC - ((pos - 1) % TABINC)) - 1
= TABINC - ((pos % TABINC) - 1) - 1
= TABINC + 1 - (pos % TABINC) - 1
= TABINC - (pos % TABINC)
which is the same as my short formula above, only more complex for no reason.

Upvotes: 0

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 726987

There is no specific name to this formula - it is a relatively straightforward way of applying math from the elementary school to everyday problems. Here is what's going on: '\t' character advances the pos by a number of positions ranging from one to TABINC, inclusive.

  • When pos is a multiple of TABINC, you jump the full TABINC
  • When pos is one below the next multiple of TABINC, you jump by one,
  • When pos is two below the next multiple of TABINC, you jump by two,
  • and so on - when pos is x, where 0 < x < TABINC, below the next multiple of TABINC, you jump x

Now the problem of calculating a jump is reduced to computing the difference between pos and the next multiple of TABINC. This can be done by computing a division remainder of the pos and TABINC, and subtracting that remainder from the TABINC. This is done with the % operator.

Since pos is one-based *, the first thing the formula does is making it zero-based for the purpose of calculating the remainder. Next, the formula computes the remainder, which is a mathematical way of saying "the number of positions above the last TABINC stop". Now all you need is subtracting that remainder from TABINC to get your result.

* The assignment pos=0 on discovering '\n' seemingly contradicts the assertion that pos is one-based. However, the loop header performs pos++ after each iteration, so the next iteration of the loop sees pos=1 on the next iteration.

Upvotes: 1

Related Questions