Problems when trying to skip '
' in reading txt files

Question

I wrote a fiarly small program to help with txt files formatting, but when I tried to read from the input files and skip unwanted ' ' I actually skipped the next character after ' ' instead.

The characters I work on in the sample file is like this:

abcde
abc

   ab
abcd

And my code looks like this:

while (!feof(fp1)) {
    ch = fgetc(fp1);
    if (ch != '
') {
        printf("%c",ch);
    }
    else {
        ch = fgetc(fp1); // move to the next character
        if (ch == '
') {
            printf("%c",ch);
        }
    }
}

The expected result is

abcdeabc
  ababcd

But I actually got

abcdebc
   abbcd

I guess the problem is in ch = fgetc(fp1); // move to the next character , but I just can't find a correct way to implement this idea.

paxdiablo · Accepted Answer

Think of the flow of your code (lines numbered below):

 1:  while (!feof(fp1)) {
 2:      ch = fgetc(fp1);
 3:      if (ch != '
') {
 4:          printf("%c",ch);
 5:      }
 6:      else {
 7:          ch = fgetc(fp1); // move to the next character
 8:          if (ch == '
') {
 9:              printf("%c",ch);
10:          }
11:      }
12:  }

When you get a newline followed by non-newline, the flow is (starting at the else line): 6, 7, 8, 10, 11, 12, 1, 2.

It's that execution of the final 2 in that sequence that effectively throws away the non-newline character that you had read at 7.

If your intent is to basically throw away single newlines and convert sequences of newlines (two or more) to a single one^(a), you can use something like the following pseudo-code:

set numNewlines to zero
while not end-file:
    get thisChar
    if numNewlines is one or thisChar is not newline:
        output thisChar
    if thisChar is newline:
        increment numNewlines
    else:
        set numNewlines to zero

This reads the character in one place, making it less likely that you'll inadvertently skip one due to confused flow.

It also uses the newline history to decide what gets printed. It only outputs a newline on the second occurrence in a sequence of newlines, ignoring the first and any after the second.

That means taht a single newline will never be echoed and any group of two or more will be transformed into one.

Some actual C code that demonstrates this^(b) follows:

#include 
#include 

int main(void) {
    // Open file.

    FILE *fp = fopen("testprog.in", "r");
    if (fp == NULL) {
        fprintf(stderr, "Cannot open input file
");
        return 1;
    }

    // Process character by character.

    int numNewlines = 0;
    while (true) {
        // Get next character, stop if none left.

        int ch = fgetc(fp);
        if (ch == EOF) break;

        // Output only second newline in a sequence of newlines,
        // or any non-nwline.

        if (numNewlines  == 1 || ch != '
') {
            putchar(ch);
        }

        // Manage sequence information.

        if (ch == '
') {
            ++numNewlines;
        } else {
            numNewlines = 0;
        }
    }

    // Finish up cleanly.

    fclose(fp);
    return 0;
}

^(a) It's unclear from your question how you want to handle sequences of three or more newlines so I've had to make an assumption.

^(b) Of course, you shouldn't use this if your intent is to learn, because:

You'll learn more if you try yourself and have to fix any issues.
Educational institutions will almost certainly check submitted code against a web search, and you'll probably be pinged for plagiarism.

I'm just providing it for completeness.

Problems when trying to skip '\n' in reading txt files

Answers (1)

Related Questions

Problems when trying to skip &#39;\n&#39; in reading txt files

Answers (1)

Related Questions

Problems when trying to skip '\n' in reading txt files