Reputation: 67
I have a series of CSV files with different data elements in them. They are structured like this:
datetime, var1, val1, var2, val2, ..., varx, valx
unfortunately, in some instances there is no var1, and in others, var1 will appear later in the line.
Sample CSV (trimmed to just a few lines, variables)
11/20/2011 3:05:00 AM,HR,115,ST-V,1.2,ST-AVF,-0.1,ST-AVL,0.1,
11/20/2011 3:05:02 AM,HR,119,ST-II,0.1,ST-AVF,-0.1,ST-AVL,0.1,
11/20/2011 3:05:04 AM,HR,122,ST-II,0.1,ST-I,0,ST-V,1.2,ST-AVR,-0.1,
11/20/2011 3:05:06 AM,HR,123,ST-II,0.1,ST-I,0,ST-V,1.2,ST-III,-0.1,
11/20/2011 3:05:08 AM,HR,122,ST-II,0.1,ST-I,0,ST-V,1.2,ST-AVL,0.1,
11/20/2011 3:05:10 AM,ST-V,1.1,ST-III,-0.4,ST-AVR,0,ST-AVL,0.2,
11/20/2011 3:05:12 AM,PVC,0,ST-II,0,ST-I,0,ST-V,1.1,ST-III,-0.4,
11/20/2011 3:05:14 AM,PVC,0,ST-II,0,ST-I,0,APNEA,0,
Ultimately, I'd like to do the following:
Desired Output (limited to the two sample variables, will expand to include all variables):
11/20/2011 3:05:00 AM,HR,115,PVC,NaN, 11/20/2011 3:05:02 AM,HR,119,PVC,NaN, 11/20/2011 3:05:04 AM,HR,122,PVC,NaN, 11/20/2011 3:05:06 AM,HR,123,PVC,NaN, 11/20/2011 3:05:08 AM,HR,122,PVC,NaN, 11/20/2011 3:05:10 AM,HR,NaN,PVC,NaN, 11/20/2011 3:05:12 AM,HR,NaN,PVC,0, 11/20/2011 3:05:14 AM,HR,NaN,PVC,0,
My progress so far has been limited to the following:
cut -d',' -f1 # pulls the datetime nicely
grep -n -o 'HR,.*' file.csv | cut -f2 -d',' # works on nearly all variables and pulls the variable from the field following the grep term, but skips all empty lines
Any suggestions on how to proceed?
Upvotes: 0
Views: 68
Reputation: 203254
Your question is very muddy but I think this is what youre trying to do:
$ cat tst.awk
BEGIN {
FS=OFS=","
numTags = split(tags,tagOrder)
for (tagNr in tagOrder) {
tagName = tagOrder[tagNr]
tagSet[tagName]
}
}
{
delete tag2val
for (fldNr=2; fldNr<=NF; fldNr++) {
if ($fldNr in tagSet) {
tag2val[$fldNr] = $(fldNr+1)
}
}
printf "%s%s", $1, OFS
for (tagNr=1; tagNr<=numTags; tagNr++) {
tagName = tagOrder[tagNr]
printf "%s%s%s%s", tagName, OFS, (tagName in tag2val ? tag2val[tagName] : "NaN"), (tagNr<numTags?OFS:ORS)
}
}
$ awk -v tags='HR,PVC' -f tst.awk file
11/20/2011 3:05:00 AM,HR,115,PVC,NaN
11/20/2011 3:05:02 AM,HR,119,PVC,NaN
11/20/2011 3:05:04 AM,HR,122,PVC,NaN
11/20/2011 3:05:06 AM,HR,123,PVC,NaN
11/20/2011 3:05:08 AM,HR,122,PVC,NaN
11/20/2011 3:05:10 AM,HR,NaN,PVC,NaN
11/20/2011 3:05:12 AM,HR,NaN,PVC,0
11/20/2011 3:05:14 AM,HR,NaN,PVC,0
If not then edit your question to clarify.
Upvotes: 2