tesolat
tesolat

Reputation: 81

Conditional transposition in awk based on column values

I'm trying to make the below transformation using awk.

Input:

status,parent,child,date
first,foo,bar,2019-01-01
NULL,foo,bar,2019-01-02
NULL,foo,bar,2019-01-03
last,foo,bar,2019-01-04
NULL,foo,bar,2019-01-05
blah,foo,bar,2019-01-06
NULL,foo,bar,2019-01-07
first,bif,baz,2019-01-02
NULL,bif,baz,2019-01-03
last,bif,baz,2019-01-04

Expected output:

parent,child,first,last
foo,bar,2019-01-01,2019-01-04
bif,baz,2019-01-02,2019-01-04

I'm pretty stumped by this problem, and haven't got anything to show yet - any pointers would be very helpful.

Upvotes: 0

Views: 126

Answers (3)

Ed Morton
Ed Morton

Reputation: 204721

$ cat tst.awk
BEGIN { FS=OFS="," }
{ key = $2 OFS $3 }
FNR==1 { print key, "first", "last" }
$1=="first" { first[key] = $4 }
$1=="last"  { print key, first[key], $4 }

$ awk -f tst.awk file
parent,child,first,last
foo,bar,2019-01-01,2019-01-04
bif,baz,2019-01-02,2019-01-04

If you can have a first without a last or vice-versa or they can occur out of order then include those cases in the example in your question.

Upvotes: 1

ghoti
ghoti

Reputation: 46896

Not awk, you already have that, but here's an option in bash alone, just for kicks.

#!/usr/bin/env bash

declare -A first=()

printf 'parent,child,first,last\n'

while IFS=, read pos a b date; do
  case "$pos" in
    first) first[$a,$b]=$date ;;
    last) printf "%s,%s,%s,%s\n" "$a" "$b" "${first[$a,$b]}" "$date" ;;
  esac
done < input.csv

Requires bash 4+ for the associative array.

Upvotes: 1

RavinderSingh13
RavinderSingh13

Reputation: 133770

Could you please try following.

awk '
BEGIN{
  FS=OFS=SUBSEP=","
  print "parent,child,first,last"
}
$1=="first" || $1=="last"{
  a[$1,$2,$3]=$NF
  b[$2,$3]
}
END{
  for(i in b){
     print i,a["first",i],a["last",i]
  }
}
'  Input_file

Output will be as follows.

parent,child,first,last
bif,baz,2019-01-02,2019-01-04
foo,bar,2019-01-01,2019-01-04

Explanation: Adding detailed explanation for above code.

awk '                                       ##Starting awk program from here.
BEGIN{                                      ##Starting BEGIN section from here.
  FS=OFS=SUBSEP=","                         ##Setting Fs, OFS and SUBSEP as comma here.
  print "parent,child,first,last"           ##Printing header values as per OP request here.
}                                           ##Closing BEGIN BLOCK for this progam here.
$1=="first" || $1=="last"{                  ##Checking condition if $1 is either string first or last then do following.
  a[$1,$2,$3]=$NF                           ##Creating an array named a whose index is $1,$2,$3 and its value is $NF(last column of current line).
  b[$2,$3]                                  ##Creating an array named b whose index is $2,$3 from current line.
}                                           ##Closing main BLOCK for main program here.
END{                                        ##Starting END BLOCK for this awk program.
  for(i in b){                              ##Starting a for loop to traverse through array here.
     print i,a["first",i],a["last",i]       ##Printing variable it, array a with index of "first",i and value of array b with index of "last",i.
  }                                         ##Closing BLOCK for, for loop here.
}                                           ##Closing BLOCK for END block for this awk program here.
'  Input_file                               ##Mentioning Input_file name here.

Upvotes: 2

Related Questions