user3746195
user3746195

Reputation: 346

AWK - Match Field 1, Paste Field 2 of All Matching Columns in Same Row

I can do the following in Excel but it is very inefficient. Can someone help me write this logic in AWK as it is the language I am learning for file parsing?

Logic

Matching field 1. Print the matching field value in field1 and each value of field 4 from all matching rows, including the line which is being matched against.

Input File:

ASHBBPRJ01-ASHBBPRJ02-BE    ASHBBPRJ01.RD.AS    1   ASHBBBRJ01.RD.AS    ae1.0       strict
ASHBBPRJ01-ASHBBPRJ02-BE    ASHBBPRJ01.RD.AS    2   ASHBBPRJ02.RD.AS    ae1.0       strict
ASHBBPRJ01-BSTNRCRJ01-BE    ASHBBPRJ01.RD.AS    1   ASHBBBRJ01.RD.AS    ae1.0       strict
ASHBBPRJ01-BSTNRCRJ01-BE    ASHBBPRJ01.RD.AS    2   NYRKBBRJ02.RD.NY    ae5.0       strict
ASHBBPRJ01-BSTNRCRJ01-BE    ASHBBPRJ01.RD.AS    3   NYRKBBRJ01.RD.NY    ae2.0       strict
ASHBBPRJ01-BSTNRCRJ01-BE    ASHBBPRJ01.RD.AS    4   PROVBBRJ02.RD.RI    ae3.0       strict
ASHBBPRJ01-BSTNRCRJ01-BE    ASHBBPRJ01.RD.AS    5   PROVDSRJ02.RD.RI    ae0.0       strict
ASHBBPRJ01-BSTNRCRJ01-BE    ASHBBPRJ01.RD.AS    6   BSTNRCRJ01.RD.RI    ae2.0       strict
ASHBBPRJ01-BSTNRCRJ02-BE    ASHBBPRJ01.RD.AS    1   ASHBBBRJ01.RD.AS    ae1.0       strict
ASHBBPRJ01-BSTNRCRJ02-BE    ASHBBPRJ01.RD.AS    2   NYRKBBRJ02.RD.NY    ae5.0       strict
ASHBBPRJ01-BSTNRCRJ02-BE    ASHBBPRJ01.RD.AS    3   NYRKBBRJ01.RD.NY    ae2.0       strict
ASHBBPRJ01-BSTNRCRJ02-BE    ASHBBPRJ01.RD.AS    4   PROVBBRJ02.RD.RI    ae3.0       strict
ASHBBPRJ01-BSTNRCRJ02-BE    ASHBBPRJ01.RD.AS    5   PROVDSRJ02.RD.RI    ae0.0       strict
ASHBBPRJ01-BSTNRCRJ02-BE    ASHBBPRJ01.RD.AS    6   BSTNRCRJ02.RD.RI    ae1.0       strict

Output

ASHBBPRJ01-ASHBBPRJ02-BE    ASHBBBRJ01.RD.AS    ASHBBPRJ02.RD.AS
ASHBBPRJ01-BSTNRCRJ01-BE    ASHBBBRJ01.RD.AS    NYRKBBRJ02.RD.NY    NYRKBBRJ01.RD.NY    PROVBBRJ02.RD.RI    PROVDSRJ02.RD.RI    BSTNRCRJ01.RD.RI
ASHBBPRJ01-BSTNRCRJ02-BE    ASHBBBRJ01.RD.AS    NYRKBBRJ02.RD.NY    NYRKBBRJ01.RD.NY    PROVBBRJ02.RD.RI    PROVDSRJ02.RD.RI    BSTNRCRJ02.RD.RI

Upvotes: 0

Views: 126

Answers (2)

karakfa
karakfa

Reputation: 67507

your input is already ordered

$ awk '{if($1==p) line=line OFS $4; 
        else {if(line) print line; p=$1; line=$1 OFS $4}} 
    END{print line}' file

Upvotes: 1

RavinderSingh13
RavinderSingh13

Reputation: 133518

Following awk may help you on same.

awk '{a[$1]=a[$1]?a[$1] OFS $4:$4} END{for(i in a){print i,a[i]}}'  Input_file

In case you want to get the output in same sequence of Input_file then following may help you on same.

awk '!b[$1]++{c[++i]=$1} {a[$1]=a[$1]?a[$1] OFS $4:$4} END{for(j=1;j<=i;j++){print c[j],a[c[j]]}}'  Input_file

Upvotes: 1

Related Questions