user1308144
user1308144

Reputation: 475

pattern matching within an array

I want to update rows within fileA with data in fileB, based on a pattern match between a single field in fileB and any of four fields in fileA (although the match will be the first element of an array in these fields in fileA).

fileA would look like below. $3 $4 $5 $6 are the fields where I am searching for the match, and these can be "NM" or an array of three elements separated by a ":"

H01 x001 NM NM NM NM
H01 f005 NM s10|001:60:50 NM s10|001:500:709
H06 x989 NM NM NM s560|999:70:500
H79 r679 s560|999:1000:1100 NM NM NM

fileB would look like below.

POI05 A s50|088 85.77
POI15 A s10|001 65.09
POI45 B s8970|0753 85.37
POI55 B s900|08 8.77
POI75 C s560|999 55.82
POI81 C s33|0008 5.88

The match would be between $3 of fileB and the first element of the array for $3 || $4 || $5 || $6 of fileA, the ouput would look like below. It is basically fileA with a new field $7 that is $1:$2:$4 from fileB when there is a match or "NM" when there is no match.

H01 x001 NM NM NM NM NM
H01 f005 NM s10|001:60:50 NM s10|001:500:709 POI15:A:65.09
H06 x989 NM NM NM s560|999:70:500 POI75:C:55.82
H79 r679 s560|999:1000:1100 NM NM NM POI75:C:55.82

As included in the above example, there can be numerous matches for fileB $2 within fileA.

What I have been trying to do:

I obtained help with a related problem yesterday but it lacked the complexity of (a) the match being within an array, and (B) a match being contained within any of four fields.

awk 'NR==FNR{a[$3]=$1":"$2":"$4;next}{$7=(a[$2])?a[$2]:"NM"}1' 

I need to split the array for $3 $4 $5 $6 of fileA and extract the first element of each

split($3, arr, ":") $3[1]

Upvotes: 3

Views: 217

Answers (1)

jaypal singh
jaypal singh

Reputation: 77155

This should work:

$ awk '
NR==FNR { 
    a[$3] = $1":"$2":"$4
    next
}
{
    n = split($0, tmp, /[: ]/)
    for(x=1; x<=n; x++) {
        if(a[tmp[x]]) { 
            print $0 FS a[tmp[x]]
            next
        }
    }
    print $0,"NM"
}' fileb filea
H01 x001 NM NM NM NM NM
H01 f005 NM s10|001:60:50 NM s10|001:500:709 POI15:A:65.09
H06 x989 NM NM NM s560|999:70:500 POI75:C:55.82
H79 r679 s560|999:1000:1100 NM NM NM POI75:C:55.82

Upvotes: 4

Related Questions