Reputation: 163
Have a badly formatted tsv file with empty fields all over the place. I wish to fill these empty spaces with "NA" on linux.
I tried awk '{gsub("\t\t","\tNA\t"); print$0)'
but that only substitutes one empty space to NA instance. Chaining the command awk '{gsub("\t\t","\tNA\t"); print$0)|awk '{gsub("\t\t","\tNA\t"); print$0)
does two substitutions per line - but not particularly helpful if I have many columns to deal with.
Is there a faster (neater) way to do this?
Upvotes: 0
Views: 58
Reputation: 2445
Its a bit complex since you have to handle newlines empty fields, end of line empty fields and potentially successive empty fields. I could not achieve something with sed, it's probably insane. But with awk this seems to work:
$ cat test.txt
a c d e
g h i j
k l m n
p s t
w x
$ awk -F$'\t' '{for(i=1;i<=NF;++i){if($i==""){printf "NA"}else{printf $i} if(i<NF)printf "\t"} printf "\n"}' test.txt
a NA c d e
NA g h i j
k l m n NA
p NA NA s t
NA NA w x NA
Beware copy paste, the tabs will probably be transformed to spaces... By the way I searched a solution for the CSV files, and adapted it from this thread ;) where you can see that the most readable option is the awk one.
Upvotes: 1
Reputation: 2246
Did you try with sed? For example:
cat test.txt
test test test
test test test
sed 's:\t\t*:\tNA\t:g' test.txt
test NA test NA test
test NA test NA test
Upvotes: 1
Reputation: 163
Ok this works:
awk '{ gsub(/\t\t\t/,"\tNA\tNA\t"); print $0}' test.txt | awk '{ gsub(/\t\t/,"\tNA\t"); print $0}' | awk '{ gsub(/\t\t/,"\tNA\t"); print
$0}' | awk '{gsub(/^[\t]+/,"NA\t"); print $0}'
interestingly this doesn't:
awk '{ gsub(/\t\t\t/,"\tNA\tNA\t"); print $0}' test.txt | awk '{ gsub(/\t\t/,"\tNA\t"); print $0}' | awk '{gsub(/^[\t]+/,"NA\t"); print
$0}'
I'm sure there is a more elegant solution though..
Upvotes: 0