JB Veenstra
JB Veenstra

Reputation: 21

Conditional replacement of field value in csv file

I am converting a lot of CSV files with bash scripts. They all have the same structure and the same header names. The values in the columns are variable of course. Col4 is always an integer.

Source file:

Col1;Col2;Col3;Col4
Name1;Street1;City1;2
Name2;Street2;City2;12
Name3;Street3;City3;15
Name4;Street4;City4;10
Name5;Street5;City5;3

Now when Col4 contains a certain value, for example "10", the value has to be changed in "10 pcs" and the complete line has to be duplicated. For every 5 pcs one line. So you could say that the number of duplicates is the value of Col4 divided by 5 and then rounded up. So if Col4 = 10 I need 2 duplicates and if Col4 = 12, I need 3 duplicates.

Result file:

Col1;Col2;Col3;Col4
Name1;Street1;City1;2
Name2;Street2;City2;... of 12
Name2;Street2;City2;... of 12
Name2;Street2;City2;... of 12
Name3;Street3;City3;... of 15
Name3;Street3;City3;... of 15
Name3;Street3;City3;... of 15
Name4;Street4;City4;... of 10
Name4;Street4;City4;... of 10
Name5;Street5;City5;3

Can anyone help me to put this in a script. Something with bash, sed, awk. These are the languages I'm familiar with. Although I'm interested in other solutions too.

Upvotes: 0

Views: 468

Answers (2)

JB Veenstra
JB Veenstra

Reputation: 21

Thanks Jay! This was just what I needed.

Here is the final awk code I'm using now:

 awk -F\; '$4 == "Col4" {print}; $4 < 5 {print}; $4 == 5 {print}; $4 > 5  {for (i = 0; i < ($4/5); i++) printf "%s;%s;%s;...of %s\n",$1,$2,$3,$4}' /tmp/input

I added the rule below to print the header, because it wasn't printed

 $4 == "Col4" {print}

I added the rule below to print the lines where the value is equal to 5

 $4 == 5 {print}

Upvotes: 0

Jay Rajput
Jay Rajput

Reputation: 1888

Here is the awk code assuming that the input is in a file called /tmp/input

 awk -F\; '$4 < 5 {print}; $4 > 5  {for (i = 0; i < ($4/5); i++) printf "%s;%s;%s;...of %s\n",$1,$2,$3,$4}' /tmp/input

Explanation:

There are two rules.

First rule prints any rows where the $4 is less than 5. This will also print the header

 $4 < 5 {print}

The second rule print if $4 is greater than 5. The loop runs $4/5 times:

 $4 > 5 {for (i=0; i< ($4/5); i++) printf "%s;%s;%s;...of %s\n",$1,$2,$3,$4}

Output:

    Col1;Col2;Col3;Col4
    Name1;Street1;City1;2
    Name2;Street2;City2;...of 12
    Name2;Street2;City2;...of 12
    Name2;Street2;City2;...of 12
    Name3;Street3;City3;...of 15
    Name3;Street3;City3;...of 15
    Name3;Street3;City3;...of 15
    Name4;Street4;City4;...of 10
    Name4;Street4;City4;...of 10
    Name5;Street5;City5;3

The code does not handle the use case where $4 == 5. You can handle that by adding a third rule. I did not added that. But I think you got the idea.

Upvotes: 0

Related Questions