raposu
raposu

Reputation: 58

awk,creating columns for different values

This is my imput txt file

2013121612,HCDC,0
2013121613,HCDC,84
2013121614,HCDC,100
2013121615,HCDC,98
2013121612,MSLP,1023.83
2013121613,MSLP,1023.02
2013121614,MSLP,1022.08
2013121615,MSLP,1021.61
2013121612,MAXT,12.723
2013121613,MAXT,13.412
2013121614,MAXT,13.41
2013121615,MAXT,12.482

this is my BAD or INSUFFICIENT code

awk -F"," '/MAXT|HCDC|MSLP/ {print $1,"\t",$3,"\t",$3,"\t",$3}' input.txt >> ouput.txt

and this is de output file

DATE    MAXT    HCDC    MSLP    
2013121612   0   0   0
2013121613   84      84      84
2013121614   100     100     100
2013121615   98      98      98
2013121612   1023.03     1023.03     1023.03
2013121613   1023.02     1023.02     1023.02
2013121614   1022.08     1022.08     1022.08
2013121615   1020.84     1020.84     1020.84
2013121612   12.723      12.723      12.723
2013121613   13.412      13.412      13.412
2013121614   13.41           13.41       13.41
2013121615   12.482      12.482      12.482

What I need is this output format…

DATE    MAXT    HCDC    MSLP  
2013121612   12.723       0      1023.03
2013121613   13.412          84      1023.02
2013121614   13.41          100      1022.08
2013121615   12.482      98      1020.84

I am forced to ask for help because my knowledge of unix are very few

thank you very much

Upvotes: 2

Views: 84

Answers (3)

Alexander L. Belikoff
Alexander L. Belikoff

Reputation: 5711

You are basically trying to pivot the table, reshaping it using two columns. You can use a specialized language for that (R is very good at such tasks). awk is not the best language for such jobs (although it is surely possible to do using it). I'd recommend rewriting it in Python, which might be a bit easier. The outline (no error checking and such) of the code is below:

tbl = {}       # map date to a dict of colname->values

# ingest the data

for line in myfile:
    rec = line.split()

    if rec[0] not in tbl:
        tbl[rec[0]] = {}

    tbl[rec[0]][rec[1]] = double(rec[2])

# output the table

for date in tbl:
    print date, tbl[date]['MAXT'], tbl[date]['HCDC'], tbl[date]['MSLP']

Note that it might be even easier (practically a two-liner) using NumPy but I'm not sure it is worth making this a dependency for such a small task.

Upvotes: 1

iruvar
iruvar

Reputation: 23374

awk -F, '!($1 in seen){dr[++i]=$1};{d=$1; v=$3; $0=$2; seen[d]++};
    /HCDC/{HCDC[d]=v}; /MSLP/{MSLP[d]=v};/MAXT/{MAXT[d]=v};
    END{print "DATE", "MAXT", "HCDC", "MSLP"; 
        for (j=1; j<=i; ++j) {print dr[j], (dr[j] in MAXT)? MAXT[dr[j]]: 0,
                                 (dr[j] in HCDC)? HCDC[dr[j]]: 0,
                                 (dr[j] in MSLP)? MSLP[dr[j]]: 0}}' input.txt

DATE MAXT HCDC MSLP
2013121612 12.723 0 1023.83
2013121613 13.412 84 1023.02
2013121614 13.41 100 1022.08
2013121615 12.482 98 1021.61

Upvotes: 1

glenn jackman
glenn jackman

Reputation: 246754

Here's awk:

awk -F, '
    {
        key[$1] = 1
        data[$1,$2] = $3
    } 
    END {
        print "DATE","MAXT","HCDC","MSLP"
        for (k in key)
            print k, data[k,"MAXT"], data[k,"HCDC"], data[k,"MSLP"]
    }
' input.txt | column -t
DATE        MAXT    HCDC  MSLP
2013121612  12.723  0     1023.83
2013121613  13.412  84    1023.02
2013121614  13.41   100   1022.08
2013121615  12.482  98    1021.61

Because I'm using associative arrays, the order of the keys is not guaranteed. If you need to sort the output, so something like this bash code:

{
    echo DATE MAXT HCDC MSLP
    awk -F, '
        { key[$1] = 1; data[$1,$2] = $3 }
        END { for (k in key) print k, data[k,"MAXT"], data[k,"HCDC"], data[k,"MSLP"] }
    ' input.txt | sort
} | column -t

Upvotes: 2

Related Questions