dyingStudent
dyingStudent

Reputation: 11

Squeeze blank lines into one blank line in C

Hello to refer to the same question but different code.

Replacing multiple new lines in a file with just one

int main(void){

    format();
    printf("\n");
    return 0;
}

void format(){
    int c;
    size_t nlines = 1;
    size_t nspace = 0;

    while (( c= getchar()) != EOF ){

        /*TABS*/
        if(c == '\t'){
            c = ' ';
        }
        /*SPACES*/
        if (c ==' '){
            if(nspace > 0){
                continue;
            }
            else{
                putchar(c);
                nspace++;
                nlines = 0;
            }
        }

        /*NEW LINE*/
        else if(c == '\n'){
            if(++nlines >2){
                continue;
            }
            else {
                nlines++;
                nspace = 0;
            }
            putchar(c);
        }   
        else{
            putchar(c);
            nspace = 0;
            nlines = 0;
        }       
    }
}

I want to squeeze multiple blank lines into one blank line but it doesn't seem to work and on Cygwin terminal at the stdout, last line gives me extra blank line although the input doesn't have the blank line at the end.

For example
INPUT

Hello   Hi\n
\n
\n
Hey\t\tHola\n

DESIRED OUTPUT

Hello Hi\n
\n
Hey Hola\n

ACTUAL OUTPUT

Hello Hi\n
Hey Hola\n

Please explain!

Upvotes: 0

Views: 447

Answers (2)

Jonathan Leffler
Jonathan Leffler

Reputation: 754110

Here's a variant of your code. I eliminated the format() function (which is unusual for me since most programs on SO don't use enough functions) incorporating it directly into main(). The code treats spaces and newlines more symmetrically now, fixing the double increment problem also identified in paddy's answer. It also only prints out a newline at the end if there wasn't already a newline at the end. That normalizes files which do not end with a newline. The initialization of nlines = 1; deals with multiple newlines at the start of the file — that was well done already.

#include <stdio.h>

int main(void)
{
    int c;
    size_t nlines = 1;
    size_t nspace = 0;

    while ((c = getchar()) != EOF)
    {
        if (c == '\t')
            c = ' ';
        if (c == ' ')
        {
            if (nspace < 1)
            {
                putchar(c);
                nspace++;
                nlines = 0;
            }
        }
        else if (c == '\n')
        {
            if (nlines < 2)
            {
                putchar(c);
                nlines++;
                nspace = 0;
            }
        }
        else
        {
            putchar(c);
            nspace = 0;
            nlines = 0;
        }
    }
    if (nlines == 0)
        putchar('\n');
    return 0;
}

My testing uses some Bash-specific notations. My program was sb73: The last of test input does not include a final newline. The outputs use ⌴ to indicate a newline in the output:

$ echo $'Hello   Hi\n\n\nHey\t\tHola\n' | sb73
Hello Hi⌴
⌴
Hey Hola
⌴
$

and:

$ echo $'\n\nHello   Hi\n\n\n    Hey\t\tHola\n' | sb73
⌴
Hello Hi⌴
⌴
 Hey Hola⌴
⌴
$

and:

$ printf '%s' $'\n\nHello   Hi\n\n\n    Hey\t\tHola' | sb73
⌴
Hello Hi⌴
⌴
 Hey Hola⌴
$

Handling CRLF line endings

The comments identify that the code above doesn't work on a Cygwin terminal, and the plausible reason is that the data being modified has CRLF line endings. There are various ways around this. One is to find a way of forcing the standard input into text mode. In text mode, CRLF line endings should be mapped to Unix-style '\n' (NL or LF only) endings on input, and Unix-style line ending should be mapped to CRLF line endings on output.

Alternatively, it would be possible simply to ignore CR characters:

--- sb73.c  2017-06-08 22:04:28.000000000 -0700
+++ sb47.c  2017-06-08 22:40:24.000000000 -0700
@@ -19,6 +19,8 @@
                 nlines = 0;
             }
         }
+        else if (c == '\r')
+            continue;    // Windows?
         else if (c == '\n')
         {
             if (nlines < 2)

That's a 'unified diff' showing two extra lines in the code. Or it is possible to handle CR not followed by LF as a regular character and yet handle CR followed by LF as a newline combination:

--- sb73.c  2017-06-08 22:04:28.000000000 -0700
+++ sb59.c  2017-06-08 22:42:43.000000000 -0700
@@ -19,6 +19,17 @@
                 nlines = 0;
             }
         }
+        else if (c == '\r')
+        {
+            if ((c = getchar()) == '\n')
+            {
+               ungetc(c, stdin);
+               continue;
+            }
+            putchar('\r');
+            nspace = 0;
+            nlines = 0;
+        }
         else if (c == '\n')
         {
             if (nlines < 2)

There's probably a way to write a state machine that handles CR, but that would be more complex.

I have a utod program that converts Unix-style line endings to Windows-style; I used that in the pipeline to test the new variants of the code.

Upvotes: 1

paddy
paddy

Reputation: 63481

You're incrementing nlines twice:

else if(c == '\n'){
    if(++nlines >2){  /* incremented here */
        continue;
    }
    else {
        nlines++;     /* incremented here */
        nspace = 0;
    }
    putchar(c);
}

You just want to do it once. I'd suggest just incrementing the counter until it hits 2 and then not incrementing it any more. That just means a small change:

    if(nlines >= 2){
        continue;
    }

Upvotes: 1

Related Questions