Reputation: 21
I am converting a lot of CSV files with bash scripts. They all have the same structure and the same header names. The values in the columns are variable of course. Col4 is always an integer.
Source file:
Col1;Col2;Col3;Col4 Name1;Street1;City1;2 Name2;Street2;City2;12 Name3;Street3;City3;15 Name4;Street4;City4;10 Name5;Street5;City5;3
Now when Col4 contains a certain value, for example "10", the value has to be changed in "10 pcs" and the complete line has to be duplicated. For every 5 pcs one line. So you could say that the number of duplicates is the value of Col4 divided by 5 and then rounded up. So if Col4 = 10 I need 2 duplicates and if Col4 = 12, I need 3 duplicates.
Result file:
Col1;Col2;Col3;Col4 Name1;Street1;City1;2 Name2;Street2;City2;... of 12 Name2;Street2;City2;... of 12 Name2;Street2;City2;... of 12 Name3;Street3;City3;... of 15 Name3;Street3;City3;... of 15 Name3;Street3;City3;... of 15 Name4;Street4;City4;... of 10 Name4;Street4;City4;... of 10 Name5;Street5;City5;3
Can anyone help me to put this in a script. Something with bash, sed, awk. These are the languages I'm familiar with. Although I'm interested in other solutions too.
Upvotes: 0
Views: 468
Reputation: 21
Thanks Jay! This was just what I needed.
Here is the final awk code I'm using now:
awk -F\; '$4 == "Col4" {print}; $4 < 5 {print}; $4 == 5 {print}; $4 > 5 {for (i = 0; i < ($4/5); i++) printf "%s;%s;%s;...of %s\n",$1,$2,$3,$4}' /tmp/input
I added the rule below to print the header, because it wasn't printed
$4 == "Col4" {print}
I added the rule below to print the lines where the value is equal to 5
$4 == 5 {print}
Upvotes: 0
Reputation: 1888
Here is the awk code assuming that the input is in a file called /tmp/input
awk -F\; '$4 < 5 {print}; $4 > 5 {for (i = 0; i < ($4/5); i++) printf "%s;%s;%s;...of %s\n",$1,$2,$3,$4}' /tmp/input
Explanation:
There are two rules.
First rule prints any rows where the $4
is less than 5. This will also print the header
$4 < 5 {print}
The second rule print if $4
is greater than 5. The loop runs $4/5
times:
$4 > 5 {for (i=0; i< ($4/5); i++) printf "%s;%s;%s;...of %s\n",$1,$2,$3,$4}
Output:
Col1;Col2;Col3;Col4
Name1;Street1;City1;2
Name2;Street2;City2;...of 12
Name2;Street2;City2;...of 12
Name2;Street2;City2;...of 12
Name3;Street3;City3;...of 15
Name3;Street3;City3;...of 15
Name3;Street3;City3;...of 15
Name4;Street4;City4;...of 10
Name4;Street4;City4;...of 10
Name5;Street5;City5;3
The code does not handle the use case where $4 == 5
. You can handle that by adding a third rule. I did not added that. But I think you got the idea.
Upvotes: 0