randl57
randl57

Reputation: 23

find duplicate in first field, then combine text from second field of duplicate lines

I have file.csv with two fields similar to this:

text,something
more,somethingelse
text,another
foo,bar

I sort the file so that everything in the first field is in order so that all the duplicates in the first column are grouped together.

foo,bar
more,somethingelse
text,something
text,another

What I need to do but can't figure out is to move the text in the second field to same line as the duplicate in the first field, separated by a ";". It doesn't matter what order the second field is entered. I just want the output to be something like this:

foo,bar
more,somethingelse
text,something; another

I've tried this but it doesn't work. Not surprising since I'm just learning awk.

sort file.csv | awk 'BEGIN{last = ""; value = 0;} {if ($1 == last) {print $0, "; value";}}'

I wanted 'last' to hold the value of the first field of the previous line and 'value' to hold the value of the second field of the previous line. But i couldn't figure out how to make that work.

Is it possible to do this with a shell script? Thanks for any input.

Upvotes: 2

Views: 694

Answers (1)

jaypal singh
jaypal singh

Reputation: 77105

This should work without the need for sort:

awk -F, '{
    lines[$1] = (lines[$1] ? lines[$1] "; " $2 : $0)
}
END {
    for (line in lines) print lines[line]
}' file
more,somethingelse
text,something; another
foo,bar
  • Set the input field separator to ,.
  • Check if the column1 exists in our line array. If it is then pad the second column separated by ;.
  • If the column1 is not present in our array assign the entire line as value
  • In the END block iterate through our array and print the values.

Upvotes: 4

Related Questions