Yamuna_dhungana
Yamuna_dhungana

Reputation: 663

How to replace strings of a text file based on two list using sed

I have a text file like this:

test.list

##a
##b
##C
#CHROM  0_62000_1       0_62000_5       0_62070_19        0_62000

I have OLD_SM.list

0_62000_1
0_62000
0_62070_19

and NEW_SM.list

APPLE
BANANA
KIWI

I want to replace the word in test.list that match in OLD_SM.list with NEW_SM.list.

I would prefer sed command, so I tried something like this which doesn't work.

paste OLD_SM.list NEW_SM.list | while read OLD_SM NEW_SM; do sed -i "/^#CHROM/s/[[:space:]]${OLD_SM}$/\t${NEW_SM}/g" test.list; done

Result I want

##a
##b
##C
#CHROM  APPLE       0_62000_5       KIWI        BANANA

Upvotes: 1

Views: 1327

Answers (3)

glenn jackman
glenn jackman

Reputation: 246837

A slightly different take: build up the sed program as a bash array:

sed_opts=()

while read -r old <&3; read -r new <&4; do
    sed_opts+=( -e "s/\\<$old\\>/$new/g" )
done 3< OLD_SM.list 4< NEW_SM.list

sed "${sed_opts[@]}" test.list

Upvotes: 1

KamilCuk
KamilCuk

Reputation: 141135

With GNU sed you can match beginning and ending of a word with \< \>. You may first generate a sed script from your input then pass it to sed. There have to be no special characters in input.

script=$(
      paste OLD_SM.list NEW_SM.list |
      sed 's/\(.*\)\t\(.*\)/s~\\<\1\\>~\2~g/'
)
sed -i "/^#CHROM/{ $script }" file.

The s/[[:space:]]${OLD_SM}$ - the $ matches end of line, so it's never going to work. You may do s/\(^\|[[:space:]]\)$OLD_SM\([[:space:]]\|$\)/\1$NEW_SM\2/ - match beginning of a line or space, then the word, then space or ending of line, and then substitute for backreference. Topics to research: regex and backreferences in sed.

Upvotes: 3

anubhava
anubhava

Reputation: 785246

You may use this paste + awk solution:

awk -v OFS='\t' 'NR == FNR { map[$1]=$2; next} $1 == "#CHROM" {for (i=2; i<=NF; ++i) $i in map && $i=map[$i]} 1' <(paste OLD_SM.list NEW_SM.list) test.list

##a
##b
##C
#CHROM  APPLE   0_62000_5   KIWI    BANANA

Expanded form:

awk -v OFS='\t' '
NR == FNR {
   map[$1] = $2
   next
}
$1 == "#CHROM" {
   for (i=2; i<=NF; ++i)
      $i in map && $i = map[$i]
}
1' <(paste OLD_SM.list NEW_SM.list) test.list

Upvotes: 3

Related Questions