Tom Fogarty
Tom Fogarty

Reputation: 67

Bash to reorganize CSV file

I have a series of CSV files with different data elements in them. They are structured like this:

datetime, var1, val1, var2, val2, ..., varx, valx

unfortunately, in some instances there is no var1, and in others, var1 will appear later in the line.

Sample CSV (trimmed to just a few lines, variables)

11/20/2011 3:05:00 AM,HR,115,ST-V,1.2,ST-AVF,-0.1,ST-AVL,0.1,
11/20/2011 3:05:02 AM,HR,119,ST-II,0.1,ST-AVF,-0.1,ST-AVL,0.1,
11/20/2011 3:05:04 AM,HR,122,ST-II,0.1,ST-I,0,ST-V,1.2,ST-AVR,-0.1,
11/20/2011 3:05:06 AM,HR,123,ST-II,0.1,ST-I,0,ST-V,1.2,ST-III,-0.1,
11/20/2011 3:05:08 AM,HR,122,ST-II,0.1,ST-I,0,ST-V,1.2,ST-AVL,0.1,
11/20/2011 3:05:10 AM,ST-V,1.1,ST-III,-0.4,ST-AVR,0,ST-AVL,0.2,
11/20/2011 3:05:12 AM,PVC,0,ST-II,0,ST-I,0,ST-V,1.1,ST-III,-0.4,
11/20/2011 3:05:14 AM,PVC,0,ST-II,0,ST-I,0,APNEA,0,

Ultimately, I'd like to do the following:

  1. Read through the file
  2. Copy the datetime stamp for each line
  3. Find var1
  4. Copy val1
  5. If no var1 exists, create var1, insert NaN to val1
  6. repeat for all variables
  7. save to new csv file

Desired Output (limited to the two sample variables, will expand to include all variables):

11/20/2011 3:05:00 AM,HR,115,PVC,NaN,
11/20/2011 3:05:02 AM,HR,119,PVC,NaN,
11/20/2011 3:05:04 AM,HR,122,PVC,NaN,
11/20/2011 3:05:06 AM,HR,123,PVC,NaN,
11/20/2011 3:05:08 AM,HR,122,PVC,NaN,
11/20/2011 3:05:10 AM,HR,NaN,PVC,NaN,
11/20/2011 3:05:12 AM,HR,NaN,PVC,0,
11/20/2011 3:05:14 AM,HR,NaN,PVC,0,

My progress so far has been limited to the following:

cut -d',' -f1   # pulls the datetime nicely
grep -n -o 'HR,.*' file.csv | cut -f2 -d','    # works on nearly all variables and pulls the variable from the field following the grep term, but skips all empty lines

Any suggestions on how to proceed?

Upvotes: 0

Views: 68

Answers (1)

Ed Morton
Ed Morton

Reputation: 203254

Your question is very muddy but I think this is what youre trying to do:

$ cat tst.awk
BEGIN {
    FS=OFS=","
    numTags = split(tags,tagOrder)
    for (tagNr in tagOrder) {
        tagName = tagOrder[tagNr]
        tagSet[tagName]
    }
}
{
    delete tag2val
    for (fldNr=2; fldNr<=NF; fldNr++) {
        if ($fldNr in tagSet) {
            tag2val[$fldNr] = $(fldNr+1)
        }
    }

    printf "%s%s", $1, OFS
    for (tagNr=1; tagNr<=numTags; tagNr++) {
        tagName = tagOrder[tagNr]
        printf "%s%s%s%s", tagName, OFS, (tagName in tag2val ? tag2val[tagName] : "NaN"), (tagNr<numTags?OFS:ORS)
    }
}

$ awk -v tags='HR,PVC' -f tst.awk file
11/20/2011 3:05:00 AM,HR,115,PVC,NaN
11/20/2011 3:05:02 AM,HR,119,PVC,NaN
11/20/2011 3:05:04 AM,HR,122,PVC,NaN
11/20/2011 3:05:06 AM,HR,123,PVC,NaN
11/20/2011 3:05:08 AM,HR,122,PVC,NaN
11/20/2011 3:05:10 AM,HR,NaN,PVC,NaN
11/20/2011 3:05:12 AM,HR,NaN,PVC,0
11/20/2011 3:05:14 AM,HR,NaN,PVC,0

If not then edit your question to clarify.

Upvotes: 2

Related Questions