Append data to another column in a CSV if duplicate is found in first column

Question

I have a CSV with data such as:

somename1,value1
somename1,value2
somename1,value3
anothername1,anothervalue1
anothername1,anothervalue2
anothername1,anothervalue3

I would like to rewrite the CSV so that when a duplicate in column 1 is found, the the data is appended to a new column on the first entry.

For instance, the desired output would be :

somename1,value1,value2,value3
anothername1,anothervalue1,anothervalue2,anothervalue3

How can i do this in a shell script ?

TIA

Inian · Accepted Answer

You need much more than just removing duplicated lines when using Awk, you need a logic as below to create an array of elements for each unique entry in $1.

The solution creates a hash-map with unique values in $1 working as indices of the array and elements as the value appended with a , separator.

awk 'BEGIN{FS=OFS=","; prev="";}{ if (prev != $1) {unique[$1]=$2;} else {unique[$1]=(unique[$1]","$2)} prev=$1; }END{for (i in unique) print i,unique[i]}' file
anothername1,anothervalue1,anothervalue2,anothervalue3
somename1,value1,value2,value3

A more readable version would be to have something like,

BEGIN {
    # set input and output field separator to ',' and initialize 
    # variable holding last instance of $1 to empty
    FS=OFS=","
    prev=""
}
{
    # Update the value of $2 directly in the hash array only when new
    # unique elements are found in $1

    if (prev != $1){
        unique[$1]=$2
    } 
    else {
        unique[$1]=(unique[$1]","$2)
    }   

    # Update the current $1    
    prev=$1
}
END {
    for (i in unique) {
    print i,unique[i]
}

Append data to another column in a CSV if duplicate is found in first column

Answers (2)

Related Questions