Squeeze blank lines into one blank line in C

Question

Hello to refer to the same question but different code.

Replacing multiple new lines in a file with just one

int main(void){

    format();
    printf("
");
    return 0;
}

void format(){
    int c;
    size_t nlines = 1;
    size_t nspace = 0;

    while (( c= getchar()) != EOF ){

        /*TABS*/
        if(c == '	'){
            c = ' ';
        }
        /*SPACES*/
        if (c ==' '){
            if(nspace > 0){
                continue;
            }
            else{
                putchar(c);
                nspace++;
                nlines = 0;
            }
        }

        /*NEW LINE*/
        else if(c == '
'){
            if(++nlines >2){
                continue;
            }
            else {
                nlines++;
                nspace = 0;
            }
            putchar(c);
        }   
        else{
            putchar(c);
            nspace = 0;
            nlines = 0;
        }       
    }
}

I want to squeeze multiple blank lines into one blank line but it doesn't seem to work and on Cygwin terminal at the stdout, last line gives me extra blank line although the input doesn't have the blank line at the end.

For example
INPUT

Hello   Hi





Hey		Hola

DESIRED OUTPUT

Hello Hi



Hey Hola

ACTUAL OUTPUT

Hello Hi

Hey Hola

Please explain!

Jonathan Leffler · Accepted Answer

Here's a variant of your code. I eliminated the format() function (which is unusual for me since most programs on SO don't use enough functions) incorporating it directly into main(). The code treats spaces and newlines more symmetrically now, fixing the double increment problem also identified in paddy's answer. It also only prints out a newline at the end if there wasn't already a newline at the end. That normalizes files which do not end with a newline. The initialization of nlines = 1; deals with multiple newlines at the start of the file — that was well done already.

#include 

int main(void)
{
    int c;
    size_t nlines = 1;
    size_t nspace = 0;

    while ((c = getchar()) != EOF)
    {
        if (c == '	')
            c = ' ';
        if (c == ' ')
        {
            if (nspace < 1)
            {
                putchar(c);
                nspace++;
                nlines = 0;
            }
        }
        else if (c == '
')
        {
            if (nlines < 2)
            {
                putchar(c);
                nlines++;
                nspace = 0;
            }
        }
        else
        {
            putchar(c);
            nspace = 0;
            nlines = 0;
        }
    }
    if (nlines == 0)
        putchar('
');
    return 0;
}

My testing uses some Bash-specific notations. My program was sb73: The last of test input does not include a final newline. The outputs use ⌴ to indicate a newline in the output:

$ echo $'Hello   Hi


Hey		Hola
' | sb73
Hello Hi⌴
⌴
Hey Hola
⌴
$

and:

$ echo $'

Hello   Hi


    Hey		Hola
' | sb73
⌴
Hello Hi⌴
⌴
 Hey Hola⌴
⌴
$

and:

$ printf '%s' $'

Hello   Hi


    Hey		Hola' | sb73
⌴
Hello Hi⌴
⌴
 Hey Hola⌴
$

Handling CRLF line endings

The comments identify that the code above doesn't work on a Cygwin terminal, and the plausible reason is that the data being modified has CRLF line endings. There are various ways around this. One is to find a way of forcing the standard input into text mode. In text mode, CRLF line endings should be mapped to Unix-style ' ' (NL or LF only) endings on input, and Unix-style line ending should be mapped to CRLF line endings on output.

Alternatively, it would be possible simply to ignore CR characters:

--- sb73.c  2017-06-08 22:04:28.000000000 -0700
+++ sb47.c  2017-06-08 22:40:24.000000000 -0700
@@ -19,6 +19,8 @@
                 nlines = 0;
             }
         }
+        else if (c == '
')
+            continue;    // Windows?
         else if (c == '
')
         {
             if (nlines < 2)

That's a 'unified diff' showing two extra lines in the code. Or it is possible to handle CR not followed by LF as a regular character and yet handle CR followed by LF as a newline combination:

--- sb73.c  2017-06-08 22:04:28.000000000 -0700
+++ sb59.c  2017-06-08 22:42:43.000000000 -0700
@@ -19,6 +19,17 @@
                 nlines = 0;
             }
         }
+        else if (c == '
')
+        {
+            if ((c = getchar()) == '
')
+            {
+               ungetc(c, stdin);
+               continue;
+            }
+            putchar('
');
+            nspace = 0;
+            nlines = 0;
+        }
         else if (c == '
')
         {
             if (nlines < 2)

There's probably a way to write a state machine that handles CR, but that would be more complex.

I have a utod program that converts Unix-style line endings to Windows-style; I used that in the pipeline to test the new variants of the code.

Squeeze blank lines into one blank line in C

Answers (2)

Handling CRLF line endings

Related Questions