Reputation: 1096
I have spent a lot of time on this any help would be appreciated.
I have two files as below; what I want to do is to search for every item of f1_col1
and f1_col2
separately inside the f2_col3
- if an item exists then save it and add its related row from the (f2_col3)
to a new column in the new df.
f1:(two columns)
f1_col1,f1_col2
kctd,Malat1
Gas5,Snhg6
f2:(three columns)
f2_col1,f2_col2,f2_col3
chr7,snRNA,Gas5
chr1,protein_coding,Malat1
chr2,TEC,Snhg6
chr1,TEC,kctd
So based on the two files mentioned the desired output should be:
new_df:
f1_col1,f1_col2,f2_col1,f2_col1
kctd,Malat1,chr1,chr1
Gas5,Snhg6,chr7,chr2
note: f2_col2 is not important.
I do not have a strong programming background and found this very difficult - Even though I have checked multiple sources but have not been able to develop a solution - any help is appreciated. Thanks
Upvotes: 1
Views: 80
Reputation: 204458
Based on 1 possible interpretation of your requirements and the 1 sunny-day example you provided where every key field always matches on every line, this MAY be what you're trying to do:
$ cat tst.awk
BEGIN { FS=OFS="," }
NR==FNR {
if ( FNR == 1 ) {
hdr = $1
}
map[$3] = $1
next
}
{ print $0, ( FNR>1 ? map[$1] OFS map[$2] : hdr OFS hdr ) }
$ awk -f tst.awk f2 f1
f1_col1,f1_col2,f2_col1,f2_col1
kctd,Malat1,chr1,chr1
Gas5,Snhg6,chr7,chr2
Upvotes: 4