find duplicate in first field, then combine text from second field of duplicate lines

Question

I have file.csv with two fields similar to this:

text,something
more,somethingelse
text,another
foo,bar

I sort the file so that everything in the first field is in order so that all the duplicates in the first column are grouped together.

foo,bar
more,somethingelse
text,something
text,another

What I need to do but can't figure out is to move the text in the second field to same line as the duplicate in the first field, separated by a ";". It doesn't matter what order the second field is entered. I just want the output to be something like this:

foo,bar
more,somethingelse
text,something; another

I've tried this but it doesn't work. Not surprising since I'm just learning awk.

sort file.csv | awk 'BEGIN{last = ""; value = 0;} {if ($1 == last) {print $0, "; value";}}'

I wanted 'last' to hold the value of the first field of the previous line and 'value' to hold the value of the second field of the previous line. But i couldn't figure out how to make that work.

Is it possible to do this with a shell script? Thanks for any input.

jaypal singh · Accepted Answer

This should work without the need for sort:

awk -F, '{
    lines[$1] = (lines[$1] ? lines[$1] "; " $2 : $0)
}
END {
    for (line in lines) print lines[line]
}' file
more,somethingelse
text,something; another
foo,bar

Set the input field separator to ,.
Check if the column1 exists in our line array. If it is then pad the second column separated by ;.
If the column1 is not present in our array assign the entire line as value
In the END block iterate through our array and print the values.

find duplicate in first field, then combine text from second field of duplicate lines

Answers (1)

Related Questions