mentospnz
mentospnz

Reputation: 3

use awk to search for match and rename

I'm breaking my head trying to find a solution to this and hope someone can help. I have two files... File1 contains a long list of unique strings coding for some sample sequence (single column). File2 contains many records and many columns, but here the records organize unique strings that have matching sample sequences. I want AWK to search for each unique string from File1 in File2, and rename the unique string in File1 with the string found in $1 of File2 corresponding to the record where the unique string was found.

File1

id1
id2
id3
id4
id5
id6
id7
id8
id9
id10

File2

id1,id9,id33,id35,id36,id37,id76
id5,id7,id8,id20,id22,id23
id6,id11,id13,id14

Desired Output

id1
id2
id3
id4
id5
id6
id5
id5
id1
id10

My actual File1 has about 17,000 records in $1 and File2 has about 4,000 records, with 1-400 fields. Any help is appreciated!

Upvotes: 0

Views: 267

Answers (3)

n0741337
n0741337

Reputation: 2514

Here's a different way to awk it. Put the following into an executable awk file:

#!/usr/bin/awk -f

FNR==NR {f1[$0]=NR; out[NR]=$0; cnt=NR; next}

{
split($0, f2_line, ",")
for( fld in f2_line ) {
    f1_line_num=f1[f2_line[fld]]
    if( f1_line_num!="" ) out[f1_line_num]=f2_line[1]
    }
}

END { for( j=1;j<=cnt;j++ ) print out[j] }

If you call the executable awk file awko, you'd run it like awko file1 file2. Yields the desired output from the inputs shown in the question.

The breakdown:

  • Make two arrays of file1, one keyed by unique id( f1 ), the other by line number( out ).
  • Parse each line in file2 into an array( f2_line )
  • For each field in f2_line, check if there's a line number in f1 and set it to f1_line_num.
  • If f1_line_num is non-empty, replace the corresponding entry in out.
  • At the END, print out in line number order.

Upvotes: 1

jaypal singh
jaypal singh

Reputation: 77155

Try this:

awk '
NR==FNR {
  lines[$0]++;
  next
}
{
  for(line in lines) {
    num = split(line, flds, /,/);
    for(i=1; i<=num; i++) {
      if(flds[i] == $1) {
        print flds[1]; next
      }
    }
  }
  print $1; next
}' file2 file1
id1
id2
id3
id4
id5
id6
id5
id5
id1
id10
  • We first scan the file2 and store entire lines from file2 in an array called lines as key
  • Once file2 is stored completely we move to file1.
  • For each line in our lines array, we split the line with , as delimiter and store the values from line in a flds array
  • We iterate through our flds array. If we find a matching value in our array to our column1 from file1 we print the first element of our array (that is column1 from file2).
  • If we don't find a match after scanning all lines we just print the column1 from file1 as is.

Upvotes: 1

Michael Lorton
Michael Lorton

Reputation: 44416

Awk I dunno. Sed?

sed 's/^\([^,]*\),\(.*\)/s;\\(\2\\);\1;/' File2 | sed 's/,/\\|/g' > temp.sed
sed -f temp.sed File1 > Desired

Upvotes: 1

Related Questions