Reputation: 346
I can do the following in Excel but it is very inefficient. Can someone help me write this logic in AWK as it is the language I am learning for file parsing?
Logic
Matching field 1. Print the matching field value in field1 and each value of field 4 from all matching rows, including the line which is being matched against.
Input File:
ASHBBPRJ01-ASHBBPRJ02-BE ASHBBPRJ01.RD.AS 1 ASHBBBRJ01.RD.AS ae1.0 strict
ASHBBPRJ01-ASHBBPRJ02-BE ASHBBPRJ01.RD.AS 2 ASHBBPRJ02.RD.AS ae1.0 strict
ASHBBPRJ01-BSTNRCRJ01-BE ASHBBPRJ01.RD.AS 1 ASHBBBRJ01.RD.AS ae1.0 strict
ASHBBPRJ01-BSTNRCRJ01-BE ASHBBPRJ01.RD.AS 2 NYRKBBRJ02.RD.NY ae5.0 strict
ASHBBPRJ01-BSTNRCRJ01-BE ASHBBPRJ01.RD.AS 3 NYRKBBRJ01.RD.NY ae2.0 strict
ASHBBPRJ01-BSTNRCRJ01-BE ASHBBPRJ01.RD.AS 4 PROVBBRJ02.RD.RI ae3.0 strict
ASHBBPRJ01-BSTNRCRJ01-BE ASHBBPRJ01.RD.AS 5 PROVDSRJ02.RD.RI ae0.0 strict
ASHBBPRJ01-BSTNRCRJ01-BE ASHBBPRJ01.RD.AS 6 BSTNRCRJ01.RD.RI ae2.0 strict
ASHBBPRJ01-BSTNRCRJ02-BE ASHBBPRJ01.RD.AS 1 ASHBBBRJ01.RD.AS ae1.0 strict
ASHBBPRJ01-BSTNRCRJ02-BE ASHBBPRJ01.RD.AS 2 NYRKBBRJ02.RD.NY ae5.0 strict
ASHBBPRJ01-BSTNRCRJ02-BE ASHBBPRJ01.RD.AS 3 NYRKBBRJ01.RD.NY ae2.0 strict
ASHBBPRJ01-BSTNRCRJ02-BE ASHBBPRJ01.RD.AS 4 PROVBBRJ02.RD.RI ae3.0 strict
ASHBBPRJ01-BSTNRCRJ02-BE ASHBBPRJ01.RD.AS 5 PROVDSRJ02.RD.RI ae0.0 strict
ASHBBPRJ01-BSTNRCRJ02-BE ASHBBPRJ01.RD.AS 6 BSTNRCRJ02.RD.RI ae1.0 strict
Output
ASHBBPRJ01-ASHBBPRJ02-BE ASHBBBRJ01.RD.AS ASHBBPRJ02.RD.AS
ASHBBPRJ01-BSTNRCRJ01-BE ASHBBBRJ01.RD.AS NYRKBBRJ02.RD.NY NYRKBBRJ01.RD.NY PROVBBRJ02.RD.RI PROVDSRJ02.RD.RI BSTNRCRJ01.RD.RI
ASHBBPRJ01-BSTNRCRJ02-BE ASHBBBRJ01.RD.AS NYRKBBRJ02.RD.NY NYRKBBRJ01.RD.NY PROVBBRJ02.RD.RI PROVDSRJ02.RD.RI BSTNRCRJ02.RD.RI
Upvotes: 0
Views: 126
Reputation: 67507
your input is already ordered
$ awk '{if($1==p) line=line OFS $4;
else {if(line) print line; p=$1; line=$1 OFS $4}}
END{print line}' file
Upvotes: 1
Reputation: 133518
Following awk
may help you on same.
awk '{a[$1]=a[$1]?a[$1] OFS $4:$4} END{for(i in a){print i,a[i]}}' Input_file
In case you want to get the output in same sequence of Input_file then following may help you on same.
awk '!b[$1]++{c[++i]=$1} {a[$1]=a[$1]?a[$1] OFS $4:$4} END{for(j=1;j<=i;j++){print c[j],a[c[j]]}}' Input_file
Upvotes: 1