Reputation: 45
I have a CSV with data such as:
somename1,value1
somename1,value2
somename1,value3
anothername1,anothervalue1
anothername1,anothervalue2
anothername1,anothervalue3
I would like to rewrite the CSV so that when a duplicate in column 1 is found, the the data is appended to a new column on the first entry.
For instance, the desired output would be :
somename1,value1,value2,value3
anothername1,anothervalue1,anothervalue2,anothervalue3
How can i do this in a shell script ?
TIA
Upvotes: 0
Views: 189
Reputation: 85560
You need much more than just removing duplicated lines when using Awk, you need a logic as below to create an array of elements for each unique entry in $1
.
The solution creates a hash-map with unique values in $1
working as indices of the array and elements as the value appended with a ,
separator.
awk 'BEGIN{FS=OFS=","; prev="";}{ if (prev != $1) {unique[$1]=$2;} else {unique[$1]=(unique[$1]","$2)} prev=$1; }END{for (i in unique) print i,unique[i]}' file
anothername1,anothervalue1,anothervalue2,anothervalue3
somename1,value1,value2,value3
A more readable version would be to have something like,
BEGIN {
# set input and output field separator to ',' and initialize
# variable holding last instance of $1 to empty
FS=OFS=","
prev=""
}
{
# Update the value of $2 directly in the hash array only when new
# unique elements are found in $1
if (prev != $1){
unique[$1]=$2
}
else {
unique[$1]=(unique[$1]","$2)
}
# Update the current $1
prev=$1
}
END {
for (i in unique) {
print i,unique[i]
}
Upvotes: 1
Reputation: 421
FILE=$1
NAMES=`cut -d',' -f 1 $FILE | sort -u`
for NAME in $NAMES; do
echo -n "$NAME"
VALUES=`grep "$NAME" $FILE | cut -d',' -f2`
for VAL in $VALUES; do
echo -n ",$VAL"
done
echo ""
done
running with your data generates:
>bash script.sh data1.txt
anothername1,anothervalue1,anothervalue2,anothervalue3
somename1,value1,value2,value3
the filename of your data has to be passed as parameter. output can be written to a new file by redirecting.
>bash script.sh data1.txt > data_new.txt
Upvotes: 1