Reputation: 145
I have csv-file that looks like this:
12625,6475,387,-388,-332,-217,-104,17,125,160,121,38,-101,-282,-368
-2675,6475,420,-385,-330,-217,-106,16,124,158,120,37,-104,-281,-365
2725,6475,633,-377,-327,-222,-117,6,113,148,109,26,-114,-282,-359
-12775,6475,927,-367,-324,-229,-133,-9,99,134,95,11,-128,-283,-351
12825,64751200,-357,-320,-236,-147,-23,86,121,82,-3,-140,-283,-344
^ missing comma
In some rows I have the problem shown in the last row of the example, where a comma is missing between the second and third column. I know from the data that the most digits a legitimate entry can have is 5 (in some cases with a - in front) and all entries that have 8 digits originate from missing commas, which should appear after the fourth digit.
I am looking from an expression - presumably with sed
- that inserts a comma after the fourth digit of all 8-digit numbers in the file.
What I have so far is
echo "12356" | sed 's/\B[0-9]\{3\}/&,/g'
which will insert a comma after four digits. How can filter such that this will only happen for 8-digit numbers, not for 5-digit numbers.
I am also open to any more elegant way that might exist to solve that problem.
Thank you
Upvotes: 0
Views: 364
Reputation: 2778
Because sed
has already been mentioned, here’s some awk
…
awk -F, -vOFS=, '{
for (i = 1; i <= NF; ++i)
if (length($i) >= 8)
$i = substr($i, 1, 4) "," substr($i, 5)
} 1' < some_file.csv
…and here’s some pure Bash, for no good reason:
(
IFS=,
while read -ra line; do
for i in "${!line[@]}"; do
((${#line[i]} >= 8)) && line[i]="${line[i]::4},${line[i]:4}"
done
printf '%s\n' "${line[*]}"
done
) < some_file.csv
Upvotes: 1