Reputation: 231
I have a CSV with different columns and one column includes the email. The script should generate a string based on the columns and add it to the CSV as an additional column. At the moment the script runs through each line and forms the string. But what I want is, that the script only builds the string of rows with the same email address. The tricky thing is, I don't know the different email addresses, so I can't hardcoding this and group them dynamically.
What could be an approach to build this?
This is my initial CSV:
42342;home;2020-01-12;2020-01-13;[email protected]
45235;home;2020-04-12;2020-04-13;[email protected]
68787;photo;2020-05-12;2020-05-13;[email protected]
68787;test;2020-05-12;2020-05-13;[email protected]
68787;test;2020-05-12;2020-05-13;[email protected]
68787;test;2020-05-12;2020-05-13;[email protected]
How the CSV should look like at the end:
42342;home;2020-01-12;2020-01-13;[email protected];home 2020-01-12_2020-01-13 && home 2020-04-12_2020-04-13
45235;home;2020-04-12;2020-04-13;[email protected];home 2020-01-12_2020-01-13 && home 2020-04-12_2020-04-13
68787;photo;2020-05-12;2020-05-13;[email protected];photo 2020-05-12_2020-05-13
68787;test;2020-05-12;2020-05-13;[email protected];test 2020-05-12_2020-05-13 && test 2020-05-12_2020-05-13 && test 2020-05-12_2020-05-13
68787;test;2020-05-12;2020-05-13;[email protected];test 2020-05-12_2020-05-13 && test 2020-05-12_2020-05-13 && test 2020-05-12_2020-05-13
68787;test;2020-05-12;2020-05-13;[email protected];test 2020-05-12_2020-05-13 && test 2020-05-12_2020-05-13 && test 2020-05-12_2020-05-13
This is my bash script:
getPhotosCommand(){
com=""
header="ID;DIR;START_DATE;END_DATE"
while read line; do
IFS=';' read -r -a array <<< "$line"
dir=${array[2]}
start_date=${array[3]}
end_date=${array[4]}
newCom="$dir $start_date_$end_date && "
com=$com$newCom
done < $file_new_photos
echo $com
}
Upvotes: 1
Views: 109
Reputation: 17208
A two-pass awk solution that keeps the original order of the rows. The first pass is for building the 6th field (for each email), and the second pass is for appending them to the corresponding rows.
awk '
BEGIN {FS = OFS = ";"}
{
if (NR == FNR) {
str = $2 " " $3 " " $4
if (arr[$5]) {
arr[$5] = arr[$5] " && " str
} else {
arr[$5] = str
}
} else {
print $0, arr[$5]
}
}
' file.scsv file.scsv
output:
42342;home;2020-01-12;2020-01-13;[email protected];home 2020-01-12 2020-01-13 && home 2020-04-12 2020-04-13
45235;home;2020-04-12;2020-04-13;[email protected];home 2020-01-12 2020-01-13 && home 2020-04-12 2020-04-13
68787;photo;2020-05-12;2020-05-13;[email protected];photo 2020-05-12 2020-05-13
68787;test;2020-05-12;2020-05-13;[email protected];test 2020-05-12 2020-05-13 && test 2020-05-12 2020-05-13 && test 2020-05-12 2020-05-13
68787;test;2020-05-12;2020-05-13;[email protected];test 2020-05-12 2020-05-13 && test 2020-05-12 2020-05-13 && test 2020-05-12 2020-05-13
68787;test;2020-05-12;2020-05-13;[email protected];test 2020-05-12 2020-05-13 && test 2020-05-12 2020-05-13 && test 2020-05-12 2020-05-13
Upvotes: 2
Reputation: 10133
Assuming the lines are grouped by email addresses and the file has no blank lines, this bash
script may be what you're trying to do.
#!/bin/bash
cnt=0
while IFS= read -r line; do
email=${line##*;}
if [ "$email" = "$prev_email" ]; then
grpline[cnt++]=$line
else
if ((cnt)); then
tail=
for ((i = 0; i < cnt; ++i)); do
IFS=';' read -ra fld <<< "${grpline[i]}"
tail+=" && ${fld[1]} ${fld[2]}_${fld[3]}"
done
tail=${tail:4}
for ((i = 0; i < cnt; ++i)); do
printf '%s;%s\n' "${grpline[i]}" "$tail"
done
fi
prev_email=$email
grpline[0]=$line
cnt=1
fi
done < <(cat file; echo)
If the file is not grouped by email addresses then replace the cat file
with sort -t \; -k5 file
.
Upvotes: 0
Reputation: 348
For parsing a file and outputing another one post-treated, I would use awk.
Awk has associatives array (a.k.a. map in most languages) and it may help you doing the trick.
In a nutshell, your awk script should :
Have fun =)
Upvotes: 0