Turtle
Turtle

Reputation: 163

substitute consecutive tabs for "\tNA\t"

Have a badly formatted tsv file with empty fields all over the place. I wish to fill these empty spaces with "NA" on linux.

I tried awk '{gsub("\t\t","\tNA\t"); print$0)' but that only substitutes one empty space to NA instance. Chaining the command awk '{gsub("\t\t","\tNA\t"); print$0)|awk '{gsub("\t\t","\tNA\t"); print$0) does two substitutions per line - but not particularly helpful if I have many columns to deal with.

Is there a faster (neater) way to do this?

Upvotes: 0

Views: 58

Answers (3)

Emilien
Emilien

Reputation: 2445

Its a bit complex since you have to handle newlines empty fields, end of line empty fields and potentially successive empty fields. I could not achieve something with sed, it's probably insane. But with awk this seems to work:

$ cat test.txt
a       c   d   e
    g   h   i   j
k   l   m   n   
p           s   t
        w   x   

$ awk -F$'\t' '{for(i=1;i<=NF;++i){if($i==""){printf "NA"}else{printf $i} if(i<NF)printf "\t"} printf "\n"}' test.txt 
a   NA  c   d   e
NA  g   h   i   j
k   l   m   n   NA
p   NA  NA  s   t
NA  NA  w   x   NA

Beware copy paste, the tabs will probably be transformed to spaces... By the way I searched a solution for the CSV files, and adapted it from this thread ;) where you can see that the most readable option is the awk one.

Upvotes: 1

molivier
molivier

Reputation: 2246

Did you try with sed? For example:

cat test.txt
test        test        test
test        test        test

sed 's:\t\t*:\tNA\t:g' test.txt

test    NA  test    NA  test
test    NA  test    NA  test

Upvotes: 1

Turtle
Turtle

Reputation: 163

Ok this works: awk '{ gsub(/\t\t\t/,"\tNA\tNA\t"); print $0}' test.txt | awk '{ gsub(/\t\t/,"\tNA\t"); print $0}' | awk '{ gsub(/\t\t/,"\tNA\t"); print $0}' | awk '{gsub(/^[\t]+/,"NA\t"); print $0}'

interestingly this doesn't: awk '{ gsub(/\t\t\t/,"\tNA\tNA\t"); print $0}' test.txt | awk '{ gsub(/\t\t/,"\tNA\t"); print $0}' | awk '{gsub(/^[\t]+/,"NA\t"); print $0}'

I'm sure there is a more elegant solution though..

Upvotes: 0

Related Questions