heisthere
heisthere

Reputation: 372

fgets stops before reading to the end of csv, and wrong output

I use fgets to read csv, which has 100000 lines.

int** readCsv(char *str) {

    char file_name[100];
    strcpy (file_name, PATH);
    strcat( file_name, str);
    FILE *fp;
    fp = fopen(file_name, "r");

    if (!fp) {
        fprintf(stderr, "failed to open file for reading\n");
        //return 1;
    }

    char line[MAX_LINE_SIZE];
    int *result = NULL;
    int **arr;
    int len[ROW];
    arr = (int **)malloc(sizeof(int *) * ROW);
    int row = 0;

    while(fgets(line, MAX_LINE_SIZE, fp) != NULL) {
        int column;
        printf("%s\n", line);
        // plus one because the last number did not be added with comma
        column = countComma(line) + 1;
        len[row] = column;
        printf("%d\t", column);
        result = strtok(line, ",");
        arr[row] = (int *)malloc(sizeof(int) * column);
        column = 0;
        while( result != NULL ) {
            arr[row][column] = atoi(result);
            //printf("%d\t", arr[row][column]);
            result = strtok(NULL, ",");
            column++;
        }
        printf("%d\n", row);
        //printf("\n");
        row++;
    }
    fclose (fp);
    return arr;
}

and it stopped at line 5256, out of 100000. The data that it read was also wrong (the data of line 5256 is originally data of line 5276). I don't know where went wrong, any help would be greatly appreciated. Thanks guys!

where it stoppted original data

(I set MAX_LINE_SIZE to 100000. if this information helps)

Upvotes: 0

Views: 73

Answers (1)

Mentat
Mentat

Reputation: 381

I had type this up earlier but delay posting it since I could not find anything in your code to cause an access violation. But now that you identified the issue. I thought I sum up what I and other found wrong with your code. Note most of my observations are mostly style preference and not that you really did anything wrong.

Hera is a breakdown of the problems I see in your code.

File error handling

Since your routine is expected to return a pointer address, you should return a NULL value whenever an error is encountered. This signals to the caller, who should be checking for nulls, that something went wrong. Also, consider using the perror() function. Along with displaying a message you provide, it will also add a reason text for the IO error.

if (fp)
{
   perror("failed to open file for reading.");
   return NULL;
}

Sample output on open error:

failed to open file for reading: file not found.

The MAX_LINE_SIZE and ROW values

The variable use to read each line only needs to be the size of the longest line in the file plus 2 bytes that represent the line terminator and null character (see fgets documentation). It should not be set to the max lines contains in the file. You can over size it, but 100000 is way too big. Not knowing your data, you should consider something smaller like 1024. ROW should be set to the max lines expected to be read in.

#define MAX_LINE_SIZE 1024
#define ROW 100000

Allocation error handling

Always check to see if an allocation routine was successfully. Again, since the caller should be checking for nulls, if any errors are encountered, abort with a null return value.

arr = (int **)malloc(sizeof(int *) * ROW);
if ( arr == NULL ) return NULL;

and here

arr[row] = (int *)malloc(sizeof(int) * column);
if ( arr[row] == NULL ) return NULL;

Trim extra character added by fgets()

There are many ways to accomplish this but you should remove the line terminator from your data. Below is one suggestion in doing this:

if ( strlen(line) > 0 ) line[strlen(line)-1] = 0;

Grouping statements

Try to group statements that perform a certain task together. This will make your code easier to read.

Example 1: Column allocation logic

column =  countComma(line) +1;
arr[row] = (int *)malloc(sizeof(int) * column);
len[row] = column;

Example 2: String parse logic

column = 0;
result = strtok(line, ",");
while ( result != NULL ) {
    arr[row][column] = atoi(result);
    //printf("%d\t", arr[row][column]);
    result = strtok(NULL, ",");
    column++;
}

strtok return type

You should have received a warning for this, but “result” should be defined as a char pointer and not an int pointer. (see strtok documentation)

char *result = NULL;

As mention by my comment below, my version of your code ran fine without most of these suggestions. The suggestions here are just my observations and opinions. Use your own judgment when considering any of these suggestions.

Upvotes: 1

Related Questions