Reputation: 3
I'm breaking my head trying to find a solution to this and hope someone can help. I have two files... File1 contains a long list of unique strings coding for some sample sequence (single column). File2 contains many records and many columns, but here the records organize unique strings that have matching sample sequences. I want AWK to search for each unique string from File1 in File2, and rename the unique string in File1 with the string found in $1 of File2 corresponding to the record where the unique string was found.
File1
id1
id2
id3
id4
id5
id6
id7
id8
id9
id10
File2
id1,id9,id33,id35,id36,id37,id76
id5,id7,id8,id20,id22,id23
id6,id11,id13,id14
Desired Output
id1
id2
id3
id4
id5
id6
id5
id5
id1
id10
My actual File1 has about 17,000 records in $1 and File2 has about 4,000 records, with 1-400 fields. Any help is appreciated!
Upvotes: 0
Views: 267
Reputation: 2514
Here's a different way to awk it. Put the following into an executable awk file:
#!/usr/bin/awk -f
FNR==NR {f1[$0]=NR; out[NR]=$0; cnt=NR; next}
{
split($0, f2_line, ",")
for( fld in f2_line ) {
f1_line_num=f1[f2_line[fld]]
if( f1_line_num!="" ) out[f1_line_num]=f2_line[1]
}
}
END { for( j=1;j<=cnt;j++ ) print out[j] }
If you call the executable awk file awko
, you'd run it like awko file1 file2
. Yields the desired output from the inputs shown in the question.
The breakdown:
f1
), the other by line number( out
).f2_line
)f2_line
, check if there's a line number in f1
and set it to f1_line_num
.f1_line_num
is non-empty, replace the corresponding entry in out
.END
, print out
in line number order.Upvotes: 1
Reputation: 77155
Try this:
awk '
NR==FNR {
lines[$0]++;
next
}
{
for(line in lines) {
num = split(line, flds, /,/);
for(i=1; i<=num; i++) {
if(flds[i] == $1) {
print flds[1]; next
}
}
}
print $1; next
}' file2 file1
id1
id2
id3
id4
id5
id6
id5
id5
id1
id10
lines
as keylines
array, we split the line with ,
as delimiter and store the values from line in a flds
arrayflds
array. If we find a matching value in our array to our column1 from file1 we print the first element of our array (that is column1 from file2). Upvotes: 1
Reputation: 44416
Awk I dunno. Sed?
sed 's/^\([^,]*\),\(.*\)/s;\\(\2\\);\1;/' File2 | sed 's/,/\\|/g' > temp.sed
sed -f temp.sed File1 > Desired
Upvotes: 1