MZGhibli
MZGhibli

Reputation: 11

How to get rows of a csv file into a linkedlist iteratively?

I am asked to extract rows of a csv file into a linkedlist, every node of the linkedlist will store one row of the data, where the "data" pointer inside the linkedlist points towards a struct, which has every columns of information (like double heading1; char *heading2 etc.) There may be quotes around values that are separated by commas and they should be treated as a whole as the data of that particular column. For example, one field of heading 3 may contain data such as "1, 2, 3" where "1, 2, 3" should be all displayed under heading3, instead of as 3 separate columns.

What I tried is to use two while loops:

//defining data structure here
typedef struct node{
    suburb_t *data;
    struct node *next;
}Node_t;
//function definition here...
while(fscanf("%d,%d,%d", &x, &y, &z) == 3){
   Node_t node = (Node_t *)malloc(sizeof(Node_t));
   node->data = (suburb_t *)malloc(sizeof(suburb_t));
   node->data->content1 = x
   //and so on
    while(fscanf(.....) == ...){
        //similar to first line
    }
  //and here I can not figure out how to keep creating new nodes, moving the current
  //node to the next node until the outer while loop terminates.
}

The reason why I used double while loop is due to the existence of comma separated values quoted inside quotation marks, which I can then use the second while loop to read the quotation mark, store it inside a variable and read the content as a whole, until the end quotation markk is encountered. We can assume value with quotation marks will not be the first few values...

And my question is how do I create nodes, link it with the previous one iteratively until the first while loop terminates, that is either an EOF is reached or lesser than 3 values are read? My question is mainly towards the end of the code snippet I provided, I think I need to add something within that area but I just don't know what to add....

Upvotes: 1

Views: 106

Answers (2)

Serge Ballesta
Serge Ballesta

Reputation: 149105

A csv file is logically a container of rows, each row containing fields. The hard part in writing a general csv parser is that a field is allowed to contain separator and newline characters provided it is enclosed in quotation marks.

This is far beyond what the scanf family can do...

If you know the structure of the csv file (number and type of all fields) and you know that no field can contain a comma nor a new line, then it can be done through a single loop. Pseudocode:

read line by line until end of file
    per line
    alloc a new node
    link it to previous one
    parse the line into the node

Still that pseudo-code contains a number of caveats:

  • adding at the end of a singly linked list requires to keep 2 pointers, one for the head and one for the tail; moreover the handling of the first node requires caution
  • if some fields can be strings, you must allocate some memory for each of them (unless the maximum size of the fields is known and you can use true arrays in your suburb_t structure)

In the general case, you have to read the file one character at a time and build a list of logical rows (each can use more than one line...) each containing a list of fields. Not really complex but way too long for a SO answer.

TL/DR: If you have no problem in parsing a row, your code could become:

// put here the maximum size of a line of your csv file + 1 (or slightly more if you want to be super safe
#define MAXLINE 256

typedef struct node{
    suburb_t data;     // no need for indirection here...
    struct node *next;
}Node_t;
//function definition here...
FILE* fd = fopen(...);  // open the csv file
// tests for open error omitted for brievety
char line[MAXLINE];

Node_t *first, **last = &first
while(NULL != fgets(line, sizeof(line), fd)) {
   // test for empty line omitted for brievety
   Node_t* node = malloc(sizeof(Node_t));  // never cast malloc in C!
   node->next = NULL;
   // parse the line into the data member of *node
   ...
   *last = node;
   last = &(node->next);
}
fclose(fd);

But beware, the above code has not been tested and can contain a number of typos or other stupid mistakes...

Upvotes: 0

xing
xing

Reputation: 2528

Since no details are provided for suburb_t this is just an example of linking a list by adding the new node to the end of the list.
Pointers are needed to keep track of the first and last nodes.

#include <stdio.h>
#include <stdlib.h>

#define SIZE 15

typedef struct Node node;
struct Node {
    int val;
    node *next;
};

node *newnode ( int value) {
    node *item = NULL;
    if ( NULL == ( item = malloc ( sizeof *item))) {
        return NULL;
    }
    item->val = value;
    item->next = NULL; // sentinel to terminate list
    return item;
}

void addnode ( node **list, node **last, node *add) {
    if ( NULL == *list) { // empty list
        *list = add;
        *last = add;
    }
    else {
        (*last)->next = add;
        *last = add;
    }
}

void showlist ( node *list) {
    while ( list) {
        printf ( "%2d ", list->val);
        list = list->next;
    }
    printf ( "\n");
}

node *freelist ( node *list) {

    while ( list) {
        node *tmp = list;
        list = list->next;
        free ( tmp);
    }
    return list;
}

int main ( void) {
    node *list = NULL;
    node *last = NULL;
    node *add = NULL;

    for ( int each = 0; each < SIZE; ++each) {
        if ( ( add = newnode ( each))) { // create node
            addnode ( &list, &last, add); // add created node to end of list
        }
    }

    showlist ( list);
    list = freelist ( list);

    return 0;
}

Upvotes: 1

Related Questions