Reputation: 1120
I have a simple 4-column Excel spreadsheet that matches universities to their ID codes for lookup purposes. The file is pretty big (300k).
I need to come up with a way to turn this data into a populated table in my Rails app. The catch is that this is a document that is updated now and then, so it can't just be a one-time solution. Ideally, it would be some sort of ruby script that would read the file and create the entries automatically so that when we get emailed a new version, we can just update it automatically. I'm on Heroku if that matters at all.
How can I accomplish something like this?
Upvotes: 3
Views: 2663
Reputation: 2557
Parsing newish Excel files isn't too much trouble using Hpricot. This will give you a two-dimensional array:
require 'hpricot'
doc = open("data.xlsx") { |f| Hpricot(f) }
rows = doc.search('row')
rows = rows[1..rows.length] # Skips the header row
rows = rows.map do |row|
columns = []
row.search('cell').each do |cell|
# Excel stores cell indexes rather than blank cells
next_index = (cell.attributes['ss:Index']) ? (cell.attributes['ss:Index'].to_i - 1) : columns.length
columns[next_index] = cell.search('data').inner_html
end
columns
end
Upvotes: 1
Reputation: 5455
If you can, save the spreadsheet as CSV, there's much better gems for parsing CSV files than for parsing excel spreadsheets. I found an effective way of handling this kind of problem is to make a rake task that reads the CSV file and creates all the records as appropriate.
So for example, here's how to read all the lines from a file using the old, but still effective FasterCSV gem
data = FasterCSV.read('lib/tasks/data.csv')
columns = data.remove(0)
unique_column_index = -1#The index of a column that's always unique per row in the spreadsheet
data.each do | row |
r = Record.find_or_initialize_by_unique_column(row[unique_column_index])
columns.each_with_index do | index, column_name |
r[column_name] = row[index]
end
r.save! rescue => e Rails.logger.error("Failed to save #{r.inspect}")
end
It does kinda rely on you having a unique column in the original spreadsheet to go off though.
If you put that into a rake task, you can then wire it into you're Capistrano deploy script, so it'll be run every time you deploy. the find_or_initialize should ensure you shouldn't get duplicate records.
Upvotes: 3