Reputation: 13335
I have a large .csv file that looks like this
19186;1964;F;001;;;;19000101;21000101;20110630
19187;1972;M;001;MMag. Dr.;;;19000101;21000101;20110630
19190;1936;F;999;3;;;19000101;21000101;20110630
Everytime the 5th value is not an integer <10 (not 0-9) it should be removed. So the result should look like this
19186;1964;F;001;;;;19000101;21000101;20110630
19187;1972;M;001;;;;19000101;21000101;20110630
19190;1936;F;999;3;;;19000101;21000101;20110630
how can this be done with sed
?
Upvotes: 2
Views: 122
Reputation: 58371
This might work for you (GNU sed):
sed -r 's/^(([^;]*;){4})[^;0-9]+/\1/' file
Upvotes: 2
Reputation: 47269
If you can use awk
instead, this would be cleaner to read than a sed
solution, I think:
#!/bin/bash
awk 'BEGIN{FS=OFS=";"}
{if (($5 >= 10) || ($5 < 0) || ($5 % 1 != 0)) {$5=""} print}' in_file
Input:
19186;1964;F;001;;;;19000101;21000101;20110630
19187;1972;M;001;MMag. Dr.;;;19000101;21000101;20110630
19190;1936;F;999;3;;;19000101;21000101;20110630
19190;1936;F;999;-3;;;19000101;21000101;20110630
19190;1936;F;999;3.5;;;19000101;21000101;20110630
19190;1936;F;999;10;;;19000101;21000101;20110630
Output:
19186;1964;F;001;;;;19000101;21000101;20110630
19187;1972;M;001;;;;19000101;21000101;20110630
19190;1936;F;999;3;;;19000101;21000101;20110630
19190;1936;F;999;;;;19000101;21000101;20110630
19190;1936;F;999;;;;19000101;21000101;20110630
19190;1936;F;999;;;;19000101;21000101;20110630
Explanation:
awk
: invoke the awk command'...'
: provide the instructions to awk inside single quotesBEGIN{FS=OFS=";"}
: before reading input, tell awk
to use ;
as delimiters for both input and output (FS stands for Field Separator, OFS stands for Output Field Separator){if (($5 >= 10) || ($5 < 0) || ($5 % 1 != 0)) {$5=""}
: If the 5th field is not between 0-9
, or is not an integer, set that field to the empty string.print
: print the (possibly) modified line.in_file
: specify "in_file" as input file to your awk
script> out_file
to the end of the above script to redirect output to a file instead of stdout
Alternatively: for a cleaner & more robust solution, see Ed's answer.
Upvotes: 3
Reputation: 203209
You could do it in sed but it's simpler with awk:
awk 'BEGIN{FS=OFS=";"} $5!~/^[0-9]$/{$5=""} 1' file
Upvotes: 3