Ollie Edwards
Ollie Edwards

Reputation: 14779

Merging multiple rows from a csv into a single row

I've been given a bunch of contacts in csv format like so:

companyID, companyName, contactId, firstName, lastName, email

And asked to merge all the contacts from a single company into a single row like so

companyID, companyName, contactId, firstName, lastName, email, companyName, contactId, firstName, lastName, email...

As to why they want the data like this, I have no idea.

I'm not tied to any particular technology as long as it's freely available and I get the right result. How would you achieve this?

So far I tried importing into a postgres table and attempting various joins and recursive queries but I can't quite come up with the right syntax.

Upvotes: 1

Views: 971

Answers (2)

amit_g
amit_g

Reputation: 31270

If you have access to Unix/Linux or CygWin on Windows, you could use

sort csvFileName | awk -F, 'BEGIN {last="";} {if (last == $1) { printf ","; } else { printf "\n"; }; printf $0; last =$1; }'

This would repeat the CompanyID each times but you can alter the printf 0$ to output columns other than $1 or you could post process to remove those columns.

Upvotes: 1

DwB
DwB

Reputation: 38348

Here is a potential solution:

  1. Create a Contact class to hold all the info for one contact.
  2. Create a Company class to hold the info for one company.
  3. Create a Map<Company, List<Contact>> to map contacts to a company.
  4. Read the file, populating Company and Contact objects and the List<Contact> for each company.
  5. Iterate through the keySet of the map. For each map entry, output the company and contact info.

OpenCSV might be helpful.

If you don't find an open source CSV reader, you can split the line based on comma (,) and in the Company and Contact classes, just implement something like public String toCSV() class to output the object as a CSV.

Upvotes: 0

Related Questions