panbar
panbar

Reputation: 95

Concatenate two columns and paste the merged column

I have a tab delimited file as follows-

 loci1  loci2   name1   name2
    utr3p   utr3p   TERF1   ISCA2
    utr3p   intron  LPP PAAF1
    utr3p   intron  RPL37A  RCC1
    coding  intron  BAG2    RP11
    intron  intron  KIF1B   SNORA21
    intron  downstream  GUSBP4  CTD
    intron  intron  CLTC    VMP1
    utr3p   utr3p   PCYT1A  ZHX3

I would like to concatenate two columns name1 and name2 (joined by "__").The merged column should be pasted as a new column "merged_names" in a new file. How can I do it using awk.

Expected output -

loci1   loci2   name1   name2   merged_names
utr3p   utr3p   TERF1   ISCA2   TERF1__ISCA2
utr3p   intron  LPP PAAF1   LPP__PAAF1
utr3p   intron  RPL37A  RCC1    RPL37A__RCC1
coding  intron  BAG2    RP11    BAG2__RP11
intron  intron  KIF1B   SNORA21 KIF1B__SNORA21
intron  downstream  GUSBP4  CTD GUSBP4__CTD
intron  intron  CLTC    VMP1    CLTC__VMP1
utr3p   utr3p   PCYT1A  ZHX3    PCYT1A__ZHX3

Upvotes: 1

Views: 77

Answers (2)

sat
sat

Reputation: 14949

You can use this awk:

awk 'BEGIN{OFS=FS="\t"} NR==1{$(NF+1)="merged_names"} NR!=1{$(NF+1)=$(NF-1) "__" $NF}1' file

More shortened awk:

awk 'BEGIN{OFS=FS="\t"} {$(NF+1)=(NR==1)? "merged_names" : $(NF-1)"__"$NF}1' file

Upvotes: 2

P....
P....

Reputation: 18351

awk 'BEGIN{OFS="\t"; print "loci1  loci2   name1   name2 MERGED__NAMES"} {print $1,$2,$3,$4,$3 "__" $4}' infile
loci1  loci2   name1   name2 MERGED__NAMES
loci1   loci2   name1   name2   name1__name2
utr3p   utr3p   TERF1   ISCA2   TERF1__ISCA2
utr3p   intron  LPP     PAAF1   LPP__PAAF1
utr3p   intron  RPL37A  RCC1    RPL37A__RCC1
coding  intron  BAG2    RP11    BAG2__RP11
intron  intron  KIF1B   SNORA21 KIF1B__SNORA21
intron  downstream      GUSBP4  CTD     GUSBP4__CTD
intron  intron  CLTC    VMP1    CLTC__VMP1
utr3p   utr3p   PCYT1A  ZHX3    PCYT1A__ZHX3

Upvotes: 3

Related Questions