Tom
Tom

Reputation: 911

Replacing empty fields in delimited text file with dummy value

I am working on a project that takes a delimited set of data of the form:

field1~field2~field3~.....~fieldn

Having empty fields is a possibility, so

field1~~~field4~~field6

is perfectly acceptable.

This file gets translated using an inhouse translator program that leaves a little to be desired. Specifically, it doesn't deal with empty fields well. My solution was to stick some dummy value in there, like a space or an @ sign. I've tried:

sed -r 's/~/~ ~/g'

and

awk '{gsub(/\~\~/,"~ ~")}; 1' file > file.SPACE

but both of these fall short in replacing MULTIPLE fields. So if I input

field1~field2~~~field3

it'll output:

field1~field2~ ~~field3

I'd like to just script this if I could, as I can't change the code of the translator. I can change the code in the program that creates the delimited file, but I'd rather not. Is there some workaround, or is coming up with an expression for this just one of the inherent limitations in a regular language?

EDIT: Wow thanks for the quick response everyone, all your solutions worked so I upvoted all of them. I think I'm going to accept Janito's because of the explanation.

Also why the downvote?

Upvotes: 1

Views: 202

Answers (4)

William Pursell
William Pursell

Reputation: 212684

awk '{for( i=0; i<=NF; i++ ) if( $i ~ /^$/ ) $i = " " } 1' FS='~' OFS='~' input

or:

awk '/^$/{ $0 = " " } 1' ORS='~' RS='~' input

or:

awk '{ while( gsub( "~~", "~ ~" )); }1' input

Upvotes: 3

Ωmega
Ωmega

Reputation: 43703

You can use Perl

perl -pe 's/~(?=~)/~ /g'

...which says replace each "~" followed by "~" with "~ "


To store result(s) to file.SPACE use

perl -pe 's/~(?=~)/~ /g' file >file.SPACE

Upvotes: 1

Andrew Clark
Andrew Clark

Reputation: 208725

sed -e ':loop' -e 's/~~/~ ~/g' -e 't loop' file

Upvotes: 1

You could try:

sed -e ':a;s/~~/~ ~/;ta'

This creates a label "a" with the ":" command, then replaces one occurrance of ~~ with ~ ~, and then uses the "t" test command to jump back to the "a" label if the previous substitute command succeeded.

Hope this helps =)

Upvotes: 4

Related Questions