Reputation: 49
I have two huge files (with more than 1000 rows).
File-1
head File-1
1_10 PL14
1_13 GH13
13_12 GH20
13_137 GH10
13_35 GT19
14_128 GH36
14_131 GH42
14_65 GH109
15_28 GT30
15_30 GH13
16_3 CE1
File-2
head File-2
gene_id HK.1.bam HK.2.bam HK.Hu.bam HKSW.bam UHK.1.bam UHK.2.bam UHK.Hu.1.bam gene_name
1_1 0.069 0.0169 2.826 0 0.004 0.019 0.054 450
1_10 0.030 0.016 2.114 0 0.001 0.000 0.072 2055
1_11 0.012 0.014 1.739 0 0 0 0.0237 171
1_12 0.082 0.071 3.763 0.021 0 0.014 0.102 357
1_13 0.035 0.01 3.836 0 0 0 0.103 234
1_14 0.054 0.031 2.844 0.006 0.005 0.001 0.082 1125
I want to map File-1 with File-2 to get without printing the last column from File-2. It will be better if I can learn to get output as Output-1 and Output-2
Output-1
gene_id HK.1.bam HK.2.bam HK.Hu.bam HKSW.bam UHK.1.bam UHK.2.bam UHK.Hu.1.bam gene_name
1_1 0.069 0.0169 2.826 0 0.004 0.019 0.054 450
PL14 0.030 0.016 2.114 0 0.001 0.000 0.072 2055
1_11 0.012 0.014 1.739 0 0 0 0.0237 171
1_12 0.082 0.071 3.763 0.021 0 0.014 0.102 357
GH13 0.035 0.01 3.836 0 0 0 0.103 234
1_14 0.054 0.031 2.844 0.006 0.005 0.001 0.082 1125
Output-2 (unmapped rows are not printed)
gene_id HK.1.bam HK.2.bam HK.Hu.bam HKSW.bam UHK.1.bam UHK.2.bam UHK.Hu.1.bam gene_name
PL14 0.030 0.016 2.114 0 0.001 0.000 0.072 2055
GH13 0.035 0.01 3.836 0 0 0 0.103 234
I tried:
awk '
NR==FNR {
a[$1]=$2
next
}
{
print (($1 in a)?a[$1]:$1, $2, $3, $4, $5,$6, $7, $8)
}' File-1 File-2 > Output
But the Output just shows the content of File-2.
Corrections to my awk code or any other suggestions (sed, Perl) will be appreciated.
Upvotes: 0
Views: 77
Reputation: 3975
awk '
NR==FNR{ # process File1
a[$1]=$2; # map File1 columns
next # next line
}
{ # process File2
NF-- # delete last column
}
FNR==1{ # first line from File2
print > "Output1"; # write header to Output1/2
print > "Output2";
next # next line
}
!($1 in a){ # mapped false
print > "Output1" # write unmapped to Output1
}
($1 in a){ # mapped true
$1=a[$1]; # modify $1 and write mapped to Output1/2
print > "Output2";
print > "Output1"
}' File1 File2
$ head Output1 Output2
==> Output1 <==
gene_id HK.1.bam HK.2.bam HK.Hu.bam HKSW.bam UHK.1.bam UHK.2.bam UHK.Hu.1.bam
1_1 0.069 0.0169 2.826 0 0.004 0.019 0.054
PL14 0.030 0.016 2.114 0 0.001 0.000 0.072
1_11 0.012 0.014 1.739 0 0 0 0.0237
1_12 0.082 0.071 3.763 0.021 0 0.014 0.102
GH13 0.035 0.01 3.836 0 0 0 0.103
1_14 0.054 0.031 2.844 0.006 0.005 0.001 0.082
==> Output2 <==
gene_id HK.1.bam HK.2.bam HK.Hu.bam HKSW.bam UHK.1.bam UHK.2.bam UHK.Hu.1.bam
PL14 0.030 0.016 2.114 0 0.001 0.000 0.072
GH13 0.035 0.01 3.836 0 0 0 0.103
Upvotes: 1