Bash to reorganize CSV file

Question

I have a series of CSV files with different data elements in them. They are structured like this:

datetime, var1, val1, var2, val2, ..., varx, valx

unfortunately, in some instances there is no var1, and in others, var1 will appear later in the line.

Sample CSV (trimmed to just a few lines, variables)

11/20/2011 3:05:00 AM,HR,115,ST-V,1.2,ST-AVF,-0.1,ST-AVL,0.1,
11/20/2011 3:05:02 AM,HR,119,ST-II,0.1,ST-AVF,-0.1,ST-AVL,0.1,
11/20/2011 3:05:04 AM,HR,122,ST-II,0.1,ST-I,0,ST-V,1.2,ST-AVR,-0.1,
11/20/2011 3:05:06 AM,HR,123,ST-II,0.1,ST-I,0,ST-V,1.2,ST-III,-0.1,
11/20/2011 3:05:08 AM,HR,122,ST-II,0.1,ST-I,0,ST-V,1.2,ST-AVL,0.1,
11/20/2011 3:05:10 AM,ST-V,1.1,ST-III,-0.4,ST-AVR,0,ST-AVL,0.2,
11/20/2011 3:05:12 AM,PVC,0,ST-II,0,ST-I,0,ST-V,1.1,ST-III,-0.4,
11/20/2011 3:05:14 AM,PVC,0,ST-II,0,ST-I,0,APNEA,0,

Ultimately, I'd like to do the following:

Read through the file
Copy the datetime stamp for each line
Find var1
Copy val1
If no var1 exists, create var1, insert NaN to val1
repeat for all variables
save to new csv file

Desired Output (limited to the two sample variables, will expand to include all variables):

11/20/2011 3:05:00 AM,HR,115,PVC,NaN,
11/20/2011 3:05:02 AM,HR,119,PVC,NaN,
11/20/2011 3:05:04 AM,HR,122,PVC,NaN,
11/20/2011 3:05:06 AM,HR,123,PVC,NaN,
11/20/2011 3:05:08 AM,HR,122,PVC,NaN,
11/20/2011 3:05:10 AM,HR,NaN,PVC,NaN,
11/20/2011 3:05:12 AM,HR,NaN,PVC,0,
11/20/2011 3:05:14 AM,HR,NaN,PVC,0,

My progress so far has been limited to the following:

cut -d',' -f1   # pulls the datetime nicely
grep -n -o 'HR,.*' file.csv | cut -f2 -d','    # works on nearly all variables and pulls the variable from the field following the grep term, but skips all empty lines

Any suggestions on how to proceed?

Ed Morton · Accepted Answer

Your question is very muddy but I think this is what youre trying to do:

$ cat tst.awk
BEGIN {
    FS=OFS=","
    numTags = split(tags,tagOrder)
    for (tagNr in tagOrder) {
        tagName = tagOrder[tagNr]
        tagSet[tagName]
    }
}
{
    delete tag2val
    for (fldNr=2; fldNr<=NF; fldNr++) {
        if ($fldNr in tagSet) {
            tag2val[$fldNr] = $(fldNr+1)
        }
    }

    printf "%s%s", $1, OFS
    for (tagNr=1; tagNr<=numTags; tagNr++) {
        tagName = tagOrder[tagNr]
        printf "%s%s%s%s", tagName, OFS, (tagName in tag2val ? tag2val[tagName] : "NaN"), (tagNr



If not then edit your question to clarify.

Bash to reorganize CSV file

Answers (1)

Related Questions