Reputation: 95
I have a tab delimited file as follows-
loci1 loci2 name1 name2
utr3p utr3p TERF1 ISCA2
utr3p intron LPP PAAF1
utr3p intron RPL37A RCC1
coding intron BAG2 RP11
intron intron KIF1B SNORA21
intron downstream GUSBP4 CTD
intron intron CLTC VMP1
utr3p utr3p PCYT1A ZHX3
I would like to concatenate two columns name1 and name2 (joined by "__").The merged column should be pasted as a new column "merged_names" in a new file. How can I do it using awk.
Expected output -
loci1 loci2 name1 name2 merged_names
utr3p utr3p TERF1 ISCA2 TERF1__ISCA2
utr3p intron LPP PAAF1 LPP__PAAF1
utr3p intron RPL37A RCC1 RPL37A__RCC1
coding intron BAG2 RP11 BAG2__RP11
intron intron KIF1B SNORA21 KIF1B__SNORA21
intron downstream GUSBP4 CTD GUSBP4__CTD
intron intron CLTC VMP1 CLTC__VMP1
utr3p utr3p PCYT1A ZHX3 PCYT1A__ZHX3
Upvotes: 1
Views: 77
Reputation: 14949
You can use this awk
:
awk 'BEGIN{OFS=FS="\t"} NR==1{$(NF+1)="merged_names"} NR!=1{$(NF+1)=$(NF-1) "__" $NF}1' file
More shortened awk
:
awk 'BEGIN{OFS=FS="\t"} {$(NF+1)=(NR==1)? "merged_names" : $(NF-1)"__"$NF}1' file
Upvotes: 2
Reputation: 18351
awk 'BEGIN{OFS="\t"; print "loci1 loci2 name1 name2 MERGED__NAMES"} {print $1,$2,$3,$4,$3 "__" $4}' infile
loci1 loci2 name1 name2 MERGED__NAMES
loci1 loci2 name1 name2 name1__name2
utr3p utr3p TERF1 ISCA2 TERF1__ISCA2
utr3p intron LPP PAAF1 LPP__PAAF1
utr3p intron RPL37A RCC1 RPL37A__RCC1
coding intron BAG2 RP11 BAG2__RP11
intron intron KIF1B SNORA21 KIF1B__SNORA21
intron downstream GUSBP4 CTD GUSBP4__CTD
intron intron CLTC VMP1 CLTC__VMP1
utr3p utr3p PCYT1A ZHX3 PCYT1A__ZHX3
Upvotes: 3