How do I merge two CSV's with nearly identical sets using rules for which data is kept? (Using Ruby & FasterCSV)

Question

I have two csv files, each with 13 columns.

The first column of each row contains a unique string. Some are duplicated in each file, some only exist in one file.

If the row exists in only one file I want to keep it in the new file.

If it exists in both I want to keep the one that has a certain value (or lacks a certain value) in a certain column of that same row.

For example:

file 1:

D600-DS-1991, name1, address1, date1
D601-DS-1991, name2, address2, date2
D601-DS-1992, name3, address3, date3

file 2:

D600-DS-1991, name1, address1, time1
D601-DS-1992, dave1, address2, date2

I would keep the first row of the first file because the fourth column contains date instead of time. I would keep the second row of the first file since its first column, first row value is unique. I would keep the second row of the second file as the third row of the new file because it contains text other than "name#" in the second column.

Should I first map all of the unique values to one another so that each file contains the same number of entries - even if some are blank or just have filler data?

I only know a little ruby and python... but I much prefer to solve this with a single Ruby file if at all possible since I will be able to understand the code better. If you can't do it in Ruby then please feel free to answer differently!

Anthony · Accepted Answer

I'm not super happy with my solution but it works:

require 'csv'

def readcsv(filename)
  csv = {}
  CSV.foreach(filename) do |line|
    csv[line[0]] = { name: line[1], address: line[2], date: line[3] }
  end
  csv
end

csv1 = readcsv('orders1.csv')
csv2 = readcsv('orders2.csv')

results = {}
csv1.each do |id, val|
  unless csv2[id]
    results[id] = val # checks to see if it only exists in 1 file
    next
  end

  #see if name exists
  if (val[:name] =~ /name/) && (csv2[id]) && (csv2[id][:name] =~ /name/).nil?
    csv1.delete(id)
  end

  #missing some if statement regarding date vs. time
end

results = results.merge(csv2) # merge together whatever is remaining

CSV.open('newfile.csv', 'w') do |csv|
  results.each do |key, val|
    row = []
    csv << (row.push(key, val.values)).flatten
  end
end

Output of newfile.csv :

D601-DS-1991, name2, address2, date2
D600-DS-1991, name1, address1, time1
D601-DS-1992, dave1, address2, date2

How do I merge two CSV's with nearly identical sets using rules for which data is kept? (Using Ruby & FasterCSV)

Answers (2)

Related Questions

How do I merge two CSV&#39;s with nearly identical sets using rules for which data is kept? (Using Ruby &amp; FasterCSV)

Answers (2)

Related Questions

How do I merge two CSV's with nearly identical sets using rules for which data is kept? (Using Ruby & FasterCSV)