Stuart Heron
Stuart Heron

Reputation: 45

Append data to another column in a CSV if duplicate is found in first column

I have a CSV with data such as:

somename1,value1
somename1,value2
somename1,value3
anothername1,anothervalue1
anothername1,anothervalue2
anothername1,anothervalue3

I would like to rewrite the CSV so that when a duplicate in column 1 is found, the the data is appended to a new column on the first entry.

For instance, the desired output would be :

somename1,value1,value2,value3
anothername1,anothervalue1,anothervalue2,anothervalue3

How can i do this in a shell script ?

TIA

Upvotes: 0

Views: 189

Answers (2)

Inian
Inian

Reputation: 85560

You need much more than just removing duplicated lines when using Awk, you need a logic as below to create an array of elements for each unique entry in $1.

The solution creates a hash-map with unique values in $1 working as indices of the array and elements as the value appended with a , separator.

awk 'BEGIN{FS=OFS=","; prev="";}{ if (prev != $1) {unique[$1]=$2;} else {unique[$1]=(unique[$1]","$2)} prev=$1; }END{for (i in unique) print i,unique[i]}' file
anothername1,anothervalue1,anothervalue2,anothervalue3
somename1,value1,value2,value3

A more readable version would be to have something like,

BEGIN {
    # set input and output field separator to ',' and initialize 
    # variable holding last instance of $1 to empty
    FS=OFS=","
    prev=""
}
{
    # Update the value of $2 directly in the hash array only when new
    # unique elements are found in $1

    if (prev != $1){
        unique[$1]=$2
    } 
    else {
        unique[$1]=(unique[$1]","$2)
    }   

    # Update the current $1    
    prev=$1
}
END {
    for (i in unique) {
    print i,unique[i]
}

Upvotes: 1

Mario Keller
Mario Keller

Reputation: 421

    FILE=$1

    NAMES=`cut -d',' -f 1 $FILE | sort -u`

    for NAME in $NAMES; do
       echo -n "$NAME" 
       VALUES=`grep "$NAME" $FILE | cut -d',' -f2`
       for VAL in $VALUES; do
           echo -n ",$VAL"
       done
       echo ""
    done

running with your data generates:

>bash script.sh data1.txt
anothername1,anothervalue1,anothervalue2,anothervalue3
somename1,value1,value2,value3

the filename of your data has to be passed as parameter. output can be written to a new file by redirecting.

>bash script.sh data1.txt > data_new.txt

Upvotes: 1

Related Questions