Mockingbird_DX
Mockingbird_DX

Reputation: 3

awk or other shell to convert delimited list into a table

So what I have is a huge csv akin to this:

Pool1,Shard1,Event1,10
Pool1,Shard1,Event2,20
Pool1,Shard2,Event1,30
Pool1,Shard2,Event4,40
Pool2,Shard1,Event3,50

etc

Which is not ealisy readable. Eith there being only 4 types of events I'm useing spreadsheets to convert this into the following:

Pool1,Shard1,10,20,,
Pool1,Shard2,30,,,40
Pool2,Shard1,,,50,

Only events are limited to 4, pools and shards can be indefinite really. But the events may be missing from the lines - not all pools/shards have all 4 events every day.

So I tried doing this within an awk in the shell script that gathers the csv in the first place, but I'm failing spectacuraly, no working code can even be shown since it's producing zero results.

Basically I tried sorting the CSV reading the first two fields of a row, comparing to previous row and if matching comparing the third field to a set array of event strings then storing the fouth field in a variable respective to the event, and one the first two fileds are not matching - finally print the whole line including variables.

Sorry for the one-liner, testing and experimenting directly in the command line. It's embarassing, it does nothing.

awk -F, '{if (a==$1&&b==$2) {if ($3=="Event1") {r=$4} ; if ($3=="Event2") {d=$4} ; if ($3=="Event3") {t=$4} ; if ($3=="Event4") {p=$4}} else {printf $a","$b","$r","$d","$p","$t"\n"; a=$1 ; b=$2 ; if ($3=="Event1") {r=$4} ; if ($3=="Event2") {d=$4} ; if ($3=="Event3") {t=$4} ; if ($3=="Event4") {p=$4} ; a=$1; b=$2}} END {printf "\n"}'

Upvotes: 0

Views: 172

Answers (2)

Ed Morton
Ed Morton

Reputation: 203229

$ cat tst.awk
BEGIN { FS=OFS="," }
{ key = $1 OFS $2 }
key != prev {
    if ( NR>1 ) {
        print prev, f["Event1"], f["Event2"], f["Event3"], f["Event4"]
        delete f
    }
    prev = key
}
{ f[$3] = $4 }
END { print key, f["Event1"], f["Event2"], f["Event3"], f["Event4"] }

$ sort file | awk -f tst.awk
Pool1,Shard1,10,20,,
Pool1,Shard2,30,,,40
Pool2,Shard1,,,50,

Upvotes: 0

boppy
boppy

Reputation: 1908

You could simply use an assoc array: awk -F, -f parse.awk input.csv with parse.awk being:

{
    sub(/Event/, "", $3);
    res[$1","$2][$3]=$4;
}

END {
    for (name in res) {
        printf("%s,%s,%s,%s,%s\n", name, res[name][1], res[name][2], res[name][3], res[name][4])
    }
}

Order could be confused by awk, but my test output is:

Pool2,Shard1,,,50,
Pool1,Shard1,10,20,,
Pool1,Shard2,30,,,40

PS: Please use an editor to write awk source code. Your one-liner is really hard to read. Since I used a different approach, I did not even try do get it "right"... ;)

Upvotes: 2

Related Questions