Chetan Pawar
Chetan Pawar

Reputation: 25

Split csv file name to get data using C language

I have c variable file_path -

"a/b/c/xx.xxx_LOB_xxxx.caseno_YYYYMMDD.seq_no.csv"

From this file_path variable, I want to get the value of file_name excluding path,LOB,caseno,file_date(YYYYMMDD) and seq_no using C language in different variables. I tried with strtok() but not able to get the values.

Can you suggest how will get the value of each variable?

Thank you.

Upvotes: 0

Views: 647

Answers (1)

David C. Rankin
David C. Rankin

Reputation: 84561

You have several options to separate the string. (you always have several options for parsing strings in C) You can always use a pair of pointers to work your way down the input string, bracketing and copying any set of characters between the two pointers. (you can operate on a non-mutable string like a String Literal because the original isn't modified)

You can use strtok() to help break the original up into smaller parts (sometimes into exactly what you need). However in this case, since '_' can be both a delimiter as well as be an included character in what you extract, you would still need to manually parse what you need from the tokens separated by strtok(). (strtok() modifies the string it operates on, so it must be mutable)

A third option is to craft a format-string and use sscanf() to parse the variables from the input. Since your format is fixed -- you are in luck and you can simply use sscanf to separate what you need. If you are not intimately familiar with the sscanf format-string and all of the modifiers and conversion specifiers, then spend an hour reading, and understanding, man 3 scanf -- time spent will save you ten-fold hours later.

Your fixed format, assuming no one variable in the string can be greater that 127-characters (adjust as necessary), can be accommodated with the format string:

    " %*[^_]_%127[^_]%*[^.].%127[^_]_%127[^.].%127[^.]"

The string is separated into 4 strings. The parts of the string that are not needed are discarded using the assignment suppression operator the '*'. If you are separating the input into an array of strings arr, then you can write a simple function to handle the separation for you, e.g.

 int varsfrompath (char (*arr)[MAXLEN], char *str)
{
    int i = sscanf (str, " %*[^_]_%127[^_]%*[^.].%127[^_]_%127[^.].%127[^.]",
                    arr[0], arr[1], arr[2], arr[3]);
    
    return i == EOF ? 0 : i;   /* return no. of vars separated */
}

Which returns the number of items successfully parsed from the string. (zero if an input failure occurs)

A working example would be:

#include <stdio.h>
#include <string.h>

#define NELEM  4
#define MAXLEN 128

int varsfrompath (char (*arr)[MAXLEN], char *str)
{
    int i = sscanf (str, " %*[^_]_%127[^_]%*[^.].%127[^_]_%127[^.].%127[^.]",
                    arr[0], arr[1], arr[2], arr[3]);
    
    return i == EOF ? 0 : i;   /* return no. of vars separated */
}

int main (void) {
    
    char fname[] = "a/b/c/xx.xxx_LOB_xxxx.caseno_YYYYMMDD.seq_no.csv",
        results[NELEM][MAXLEN] = { "" };
    int n = varsfrompath (results, fname);
    
    for (int i = 0; i < n; i++)
        printf ("results[%2d] = '%s'\n", i, results[i]);
}

Example Use/Output

$ ./bin/varsfrompath
results[ 0] = 'LOB'
results[ 1] = 'caseno'
results[ 2] = 'YYYYMMDD'
results[ 3] = 'seq_no'

This is by far the simplest way to handle your fixed format. A manual parse with a pair of pointers is more involved from an accounting (for where you are in the string standpoint), but no more difficult. (tedious may be the word)

Look things over and if I misinterpreted your separation needs, let me know and I can adjust it.

Manual Parse Using a Pair of Pointers

If rather than spend time with the man 3 scanf man page, you would rather spend time with an 8.5x11 sheet of paper and pencil with your accounting hat on to do the same thing using a pair of pointers, then you could do something similar to the following.

You have a start pointer sp and end pointer ep and you simply work down your line of input to anchor the sp before the variable to extract, and the ep at the end of the variable and then use memcpy() to copy the characters between them. (you will have to adjust by 1 on occasion depending on whether you are pointing at the beginning of the variable you want, or once character before it to the delimiter. (the easy way to get your arithmetic right when working down the string is to only consider there being 1-char between the start and end pointers -- that way whether you need to add or subtract 1 to work around your delimiters will be clear)

You can replace the varsfrompath function above with the one that follows and receive the same results, e.g.:

int varsfrompath (char (*arr)[MAXLEN], const char *str)
{
    char *sp, *ep;      /* start pointer, end pointer */
    int i = 0;
    
    /* set sp to 1st '_' and ep to second '_', copy to arr and nul-terminate */
    if (!(sp = strchr (str, '_')) ||            /* can't find 1st '_' */
        !(ep = strchr (sp + 1, '_')) ||         /* can't find 2nd '_' */
        ep - sp - 2 > MAXLEN)                   /* chars between -1 > MAXLEN */
        return 0;
    memcpy (arr[i], sp + 1, ep - sp - 1);       /* copy ep - sp - 1 chars */
    arr[i++][ep - sp - 1] = 0;                  /* nul-terminate */
    sp = ++ep;                                  /* set sp to 1-past ep */
    
    /* set sp to next '.' and ep to next '_", copy to arr and nul-terminate */
    if (!(sp = strchr (sp, '.')) ||             /* can't find next '.' */
        !(ep = strchr (sp + 1, '_')) ||         /* can't find next '_' */
        ep - sp - 2 > MAXLEN)                   /* chars between -1 > MAXLEN */
        return i;
    memcpy (arr[i], sp + 1, ep - sp - 1);       /* copy ep - sp - 1 chars */
    arr[i++][ep - sp - 1] = 0;                  /* nul-terminate */
    sp = ++ep;                                  /* set sp to 1-past ep */
    
    /* set ep to next '.', copy to arr and nul-terminate */
    if (!(ep = strchr (sp, '.')) || ep - sp - 2 > MAXLEN)   /* no '.' or too long */
        return i;
    memcpy (arr[i], sp, ep - sp);               /* copy ep - sp chars */
    arr[i++][ep - sp] = 0;                      /* nul-terminate */
    sp = ++ep;                                  /* set sp to 1-past ep */
    
    /* repeate exact same steps for last var */
    if (!(ep = strchr (sp, '.')) || ep - sp - 2 > MAXLEN)
        return i;
    memcpy (arr[i], sp, ep - sp);
    arr[i++][ep - sp] = 0;
    sp = ++ep;
    
    return i;   /* return no. of vars separated */
}

It may look more complicated, but you are actually just using simple string functions like strchr() to position the pointers, and then just extracting the characters between them. Compare and contrast both approaches.

Upvotes: 1

Related Questions