speendo
speendo

Reputation: 13335

Remove text from integer fields in csv-files

I have a large .csv file that looks like this

19186;1964;F;001;;;;19000101;21000101;20110630
19187;1972;M;001;MMag. Dr.;;;19000101;21000101;20110630
19190;1936;F;999;3;;;19000101;21000101;20110630

Everytime the 5th value is not an integer <10 (not 0-9) it should be removed. So the result should look like this

19186;1964;F;001;;;;19000101;21000101;20110630
19187;1972;M;001;;;;19000101;21000101;20110630
19190;1936;F;999;3;;;19000101;21000101;20110630

how can this be done with sed?

Upvotes: 2

Views: 122

Answers (3)

potong
potong

Reputation: 58371

This might work for you (GNU sed):

sed -r 's/^(([^;]*;){4})[^;0-9]+/\1/' file

Upvotes: 2

sampson-chen
sampson-chen

Reputation: 47269

If you can use awk instead, this would be cleaner to read than a sed solution, I think:

#!/bin/bash

awk 'BEGIN{FS=OFS=";"}
     {if (($5 >= 10) || ($5 < 0) || ($5 % 1 != 0)) {$5=""} print}' in_file

Input:

19186;1964;F;001;;;;19000101;21000101;20110630
19187;1972;M;001;MMag. Dr.;;;19000101;21000101;20110630
19190;1936;F;999;3;;;19000101;21000101;20110630
19190;1936;F;999;-3;;;19000101;21000101;20110630
19190;1936;F;999;3.5;;;19000101;21000101;20110630
19190;1936;F;999;10;;;19000101;21000101;20110630

Output:

19186;1964;F;001;;;;19000101;21000101;20110630
19187;1972;M;001;;;;19000101;21000101;20110630
19190;1936;F;999;3;;;19000101;21000101;20110630
19190;1936;F;999;;;;19000101;21000101;20110630
19190;1936;F;999;;;;19000101;21000101;20110630
19190;1936;F;999;;;;19000101;21000101;20110630

Explanation:

  • awk: invoke the awk command
  • '...': provide the instructions to awk inside single quotes
  • BEGIN{FS=OFS=";"}: before reading input, tell awk to use ; as delimiters for both input and output (FS stands for Field Separator, OFS stands for Output Field Separator)
  • {if (($5 >= 10) || ($5 < 0) || ($5 % 1 != 0)) {$5=""}: If the 5th field is not between 0-9, or is not an integer, set that field to the empty string.
  • print: print the (possibly) modified line.
  • in_file: specify "in_file" as input file to your awk script
  • Optionally, add > out_file to the end of the above script to redirect output to a file instead of stdout

Alternatively: for a cleaner & more robust solution, see Ed's answer.

Upvotes: 3

Ed Morton
Ed Morton

Reputation: 203209

You could do it in sed but it's simpler with awk:

awk 'BEGIN{FS=OFS=";"} $5!~/^[0-9]$/{$5=""} 1' file

Upvotes: 3

Related Questions