AVA
AVA

Reputation: 101

Replacing a string in the beginning of some rows in two columns with another string in linux

I have a tab separated text file. In column 1 and 2 there are family and individual ids that start with a character followed by number as follow:

HG1005 HG1005
HG1006 HG1006
HG1007 HG1007
NA1008 NA1008
NA1009 NA1009

I would like to replace NA with HG in both the columns. I am very new to linux and tried the following code and some others:

awk '{sub("NA","HG",$2)';print}' input file > output file

Any help is highly appreciated.

Upvotes: 1

Views: 1114

Answers (3)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626738

The $2 in your call to sub only replaces the first occurrence of NA in the second field.

Note that while sed is more typical for such scenarios:

sed 's/NA/HG/g' inputfile > outputfile

you can still use awk:

awk '{gsub("NA","HG")}1' inputfile > outputfile

See the online demo.

Since there is no input variable in gsub (that performs multiple search and replaces) the default $0 is used, i.e. the whole record, the current line, and the code above is equal to awk '{gsub("NA","HG",$0)}1' inputfile > outputfile.

The 1 at the end triggers printing the current record, it is a shorter variant of print.

Upvotes: 1

Carlos Pascual
Carlos Pascual

Reputation: 1126

Notice /^NA/ position at the beginning of field:

awk '{for(i=1;i<=NF;i++)if($i ~ /^NA/) sub(/^NA/,"HG",$(i))} 1' file
HG1005 HG1005
HG1006 HG1006
HG1007 HG1007
HG1008 HG1008
HG1009 HG1009

and save it:

awk '{for(i=1;i<=NF;i++)if($i ~ /^NA/) sub(/^NA/,"HG",$(i))} 1' file > outputfile

If you have a tab as separator:

awk 'BEGIN{FS=OFS="\t"} {for(i=1;i<=NF;i++)if($i ~ /^NA/) sub(/^NA/,"HG",$(i))} 1' file > outputfile

Upvotes: 1

RavinderSingh13
RavinderSingh13

Reputation: 133438

Converting my comment to answer now, use gsub in spite of sub here. Because it will globally substitute NA to HG here.

awk 'BEGIN{FS=OFS="\t"} {gsub("NA","HG");print}' inputfile > outputfile

OR use following in case you have several fields and you want to perform substitution only in 1st and 2nd fields.

awk 'BEGIN{FS=OFS="\t"} {sub("NA","HG",$1);sub("NA","HG",$2);print}' inputfile > outputfile

Change sub to gsub in 2nd code in case multiple occurrences of NA needs to be changed within field itself.

Upvotes: 2

Related Questions