Reputation: 58
This is my imput txt file
2013121612,HCDC,0
2013121613,HCDC,84
2013121614,HCDC,100
2013121615,HCDC,98
2013121612,MSLP,1023.83
2013121613,MSLP,1023.02
2013121614,MSLP,1022.08
2013121615,MSLP,1021.61
2013121612,MAXT,12.723
2013121613,MAXT,13.412
2013121614,MAXT,13.41
2013121615,MAXT,12.482
this is my BAD or INSUFFICIENT code
awk -F"," '/MAXT|HCDC|MSLP/ {print $1,"\t",$3,"\t",$3,"\t",$3}' input.txt >> ouput.txt
and this is de output file
DATE MAXT HCDC MSLP
2013121612 0 0 0
2013121613 84 84 84
2013121614 100 100 100
2013121615 98 98 98
2013121612 1023.03 1023.03 1023.03
2013121613 1023.02 1023.02 1023.02
2013121614 1022.08 1022.08 1022.08
2013121615 1020.84 1020.84 1020.84
2013121612 12.723 12.723 12.723
2013121613 13.412 13.412 13.412
2013121614 13.41 13.41 13.41
2013121615 12.482 12.482 12.482
What I need is this output format…
DATE MAXT HCDC MSLP
2013121612 12.723 0 1023.03
2013121613 13.412 84 1023.02
2013121614 13.41 100 1022.08
2013121615 12.482 98 1020.84
I am forced to ask for help because my knowledge of unix are very few
thank you very much
Upvotes: 2
Views: 84
Reputation: 5711
You are basically trying to pivot the table, reshaping it using two columns. You can use a specialized language for that (R is very good at such tasks). awk
is not the best language for such jobs (although it is surely possible to do using it). I'd recommend rewriting it in Python, which might be a bit easier. The outline (no error checking and such) of the code is below:
tbl = {} # map date to a dict of colname->values
# ingest the data
for line in myfile:
rec = line.split()
if rec[0] not in tbl:
tbl[rec[0]] = {}
tbl[rec[0]][rec[1]] = double(rec[2])
# output the table
for date in tbl:
print date, tbl[date]['MAXT'], tbl[date]['HCDC'], tbl[date]['MSLP']
Note that it might be even easier (practically a two-liner) using NumPy but I'm not sure it is worth making this a dependency for such a small task.
Upvotes: 1
Reputation: 23374
awk -F, '!($1 in seen){dr[++i]=$1};{d=$1; v=$3; $0=$2; seen[d]++};
/HCDC/{HCDC[d]=v}; /MSLP/{MSLP[d]=v};/MAXT/{MAXT[d]=v};
END{print "DATE", "MAXT", "HCDC", "MSLP";
for (j=1; j<=i; ++j) {print dr[j], (dr[j] in MAXT)? MAXT[dr[j]]: 0,
(dr[j] in HCDC)? HCDC[dr[j]]: 0,
(dr[j] in MSLP)? MSLP[dr[j]]: 0}}' input.txt
DATE MAXT HCDC MSLP
2013121612 12.723 0 1023.83
2013121613 13.412 84 1023.02
2013121614 13.41 100 1022.08
2013121615 12.482 98 1021.61
Upvotes: 1
Reputation: 246754
Here's awk:
awk -F, '
{
key[$1] = 1
data[$1,$2] = $3
}
END {
print "DATE","MAXT","HCDC","MSLP"
for (k in key)
print k, data[k,"MAXT"], data[k,"HCDC"], data[k,"MSLP"]
}
' input.txt | column -t
DATE MAXT HCDC MSLP
2013121612 12.723 0 1023.83
2013121613 13.412 84 1023.02
2013121614 13.41 100 1022.08
2013121615 12.482 98 1021.61
Because I'm using associative arrays, the order of the keys is not guaranteed. If you need to sort the output, so something like this bash code:
{
echo DATE MAXT HCDC MSLP
awk -F, '
{ key[$1] = 1; data[$1,$2] = $3 }
END { for (k in key) print k, data[k,"MAXT"], data[k,"HCDC"], data[k,"MSLP"] }
' input.txt | sort
} | column -t
Upvotes: 2