cptkirk87
cptkirk87

Reputation: 35

C Program won't remove comments that take up the whole line

So I'm working through the K&R C book and there was a bug in my code that I simply cannot figure out.

The program is supposed to remove all the comments from a C program. Obviously I'm just using stdin

#include <stdio.h>

int getaline (char s[], int lim);

#define MAXLINE 1000 //maximum number of characters to put into string[]

#define OUTOFCOMMENT 0
#define INASINGLECOMMENT 1
#define INMULTICOMMENT 2

int main(void)
{
    int i;
    int isInComment;
    char string[MAXLINE];

    getaline(string, MAXLINE);

    for (i = 0; string[i] != EOF; ++i) {
        //finds whether loop is in a comment or not
        if (string[i] == '/') {
            if (string[i+1] == '/')
                isInComment = INASINGLECOMMENT;
            if (string[i+1] == '*')
                isInComment = INMULTICOMMENT;
        }
        //fixes the problem of print messing up after the comment
        if (isInComment == INASINGLECOMMENT && string[i] == '\0')
            printf("\n");

        //if the line is done, restates all the variables
        if (string[i] == '\0') {
            getaline(string, MAXLINE);
            i = 0;
            if (isInComment != INMULTICOMMENT)
                isInComment = OUTOFCOMMENT;
        }

        //prints current character in loop
        if(isInComment == OUTOFCOMMENT && string[i] != EOF)
            printf("%c", string[i]);

        //checks to see of multiline comment is over
        if(string[i] == '*' && string[i+1] == '/' ) {
            ++i;
            isInComment = OUTOFCOMMENT;
        }

    }
    return 0;

}

So this works great except for one problem. Whenever a line starts with a comment, it prints that comment.

So for instance, if I had a line that was simply

//this is a comment

without anything before the comment begins, it will print that comment even though it's not supposed to.

I thought I was making good progress, but this bug has really been holding me up. I hope this isn't some super easy thing I've missed.

EDIT: Forget the getaline function

//puts line into s[], returns length of that line
int getaline(char s[], int lim)
{
    int c, i;

    for (i = 0; i < lim-1 && (c = getchar()) != '\n'; ++i)
        s[i] = c;
    if (c == '\n') {
        s[i] = c;
        ++i;
    }
    s[i] = '\0';
    return i;
}

Upvotes: 3

Views: 147

Answers (2)

chqrlie
chqrlie

Reputation: 144770

There are many problems in your code:

  • isInComment is not initialized in function main.
  • as pointed by others, string[i] != EOF is wrong. You need to test for end of file more precisely, especially for files that do not end with a linefeed. This test only works if char type is signed and EOF is a valid signed char value. It will nonetheless mistakenly stop on a stray \377 character, which is legal in a string or in a comment.
  • When you detect the end of line, you read another line and reset i to 0, but i will be incremented by the for loop before you test again for single line comment... hence the bug!
  • You do not handle special cases such as /* // */ or // /*
  • You do not handle strings. This is not a comment: "/*", nor this: '//'
  • You do not handle \ at end of line (escaped linefeed). This can be used to extend single line comments, strings, etc. There are more subtle cases related to \ handling and if you really want completeness, you should handle trigraphs too.
  • Your implementation has a limit for line size, this is not needed.

The problem you are assigned is a bit tricky. Instead of reading and parsing lines, read one character at a time and implement a state machine to parse escaped linefeeds, strings, and both comment styles. The code is not too difficult if you do it right with this method.

Upvotes: 4

md5
md5

Reputation: 23699

    if (string[i] == '\0') {
        getaline(string, MAXLINE);
        i = 0;
        if (isInComment != INMULTICOMMENT)
            isInComment = OUTOFCOMMENT;
    }

When you start a new line, you initialize i to 0. But then in the next iteration:

for (i = 0; string[i] != EOF; ++i)

i will be incremented, so you'll begin the new line with index 1. Therefore there is a bug when the line begins with //.

You can see that it solves the problem if you write instead:

    if (string[i] == '\0') {
        getaline(string, MAXLINE);
        i = 0;
        if (isInComment != INMULTICOMMENT)
            isInComment = OUTOFCOMMENT;
    }

though it's usually considered as bad style to modify for loop indices inside the loop. You may redesign your implementation in a more readable way.

Upvotes: 4

Related Questions