Reputation: 9
I have a space delimited text file that contains periods .
as missing data and periods as decimal separator. I want to replace all the missing data periods with NaN
and leave the decimal separators alone - here is an example:
Sample data:
1981 12 23 . 4.5 . .
1981 12 24 4.6 7.8 1.2 22.0
1981 12 25 . . . .
1981 12 26 2.1 . 3.1 .
Desired output:
1981 12 23 NaN 4.5 NaN NaN
1981 12 24 4.6 7.8 1.2 22.0
1981 12 25 NaN NaN NaN NaN
1981 12 26 2.1 NaN 3.1 NaN
Any help using sed, tr, perl in a unix environment would be much appreciated
Upvotes: 0
Views: 522
Reputation: 58508
This might work for you:
sed ':a;s/ \. / Nan /g;ta;s/ \.$/ Nan/' file
or if numbers like .123
don't exist:
sed 's/ \./ Nan/g' file
Upvotes: 1
Reputation: 67910
Using negated look-around assertions seems to be a good idea here.
perl -plwe 's/(?<!\d)\.(?!\d)/NaN/g;' file.txt
In other words, only replace if the surrounding characters are not numbers. It might fail if you have numbers such as: .1231
(as opposed to 0.1231
). In such a case, you can remove the first look-around.
Upvotes: 6
Reputation: 132896
This Perl program will do it, replacing any dot without a digit next to it:
#!/Users/brian/bin/perls/perl5.14.2
while( <DATA> ) {
s/ (?<!\d) \. (?!\d) /NaN/xg;
print;
}
__END__
1981 12 23 . 4.5 . .
1981 12 24 4.6 7.8 1.2 22.0
1981 12 25 . . . .
1981 12 26 2.1 . 3.1 .
That's a short Perl one-liner:
% perl -pe 's/ (?<!\d) \. (?!\d) /NaN/xg' input_file
Upvotes: 6
Reputation: 109
Check if the next character after a dot is a space. If it is, add a NaN there.
Upvotes: -1