Kevin
Kevin

Reputation: 1120

How do I populate a table from an Excel Spreadsheet in Rails?

I have a simple 4-column Excel spreadsheet that matches universities to their ID codes for lookup purposes. The file is pretty big (300k).

I need to come up with a way to turn this data into a populated table in my Rails app. The catch is that this is a document that is updated now and then, so it can't just be a one-time solution. Ideally, it would be some sort of ruby script that would read the file and create the entries automatically so that when we get emailed a new version, we can just update it automatically. I'm on Heroku if that matters at all.

How can I accomplish something like this?

Upvotes: 3

Views: 2663

Answers (2)

elektronaut
elektronaut

Reputation: 2557

Parsing newish Excel files isn't too much trouble using Hpricot. This will give you a two-dimensional array:

require 'hpricot'

doc  = open("data.xlsx") { |f| Hpricot(f) }
rows = doc.search('row')
rows = rows[1..rows.length] # Skips the header row

rows = rows.map do |row|
    columns = []
    row.search('cell').each do |cell|
        # Excel stores cell indexes rather than blank cells
        next_index          = (cell.attributes['ss:Index']) ? (cell.attributes['ss:Index'].to_i - 1) : columns.length
        columns[next_index] = cell.search('data').inner_html
    end
    columns
end

Upvotes: 1

Ceilingfish
Ceilingfish

Reputation: 5455

If you can, save the spreadsheet as CSV, there's much better gems for parsing CSV files than for parsing excel spreadsheets. I found an effective way of handling this kind of problem is to make a rake task that reads the CSV file and creates all the records as appropriate.

So for example, here's how to read all the lines from a file using the old, but still effective FasterCSV gem

data = FasterCSV.read('lib/tasks/data.csv')
columns = data.remove(0)
unique_column_index = -1#The index of a column that's always unique per row in the spreadsheet
data.each do | row |
  r = Record.find_or_initialize_by_unique_column(row[unique_column_index])
  columns.each_with_index do | index, column_name |
    r[column_name] = row[index]
  end
  r.save! rescue => e Rails.logger.error("Failed to save #{r.inspect}")
end

It does kinda rely on you having a unique column in the original spreadsheet to go off though.

If you put that into a rake task, you can then wire it into you're Capistrano deploy script, so it'll be run every time you deploy. the find_or_initialize should ensure you shouldn't get duplicate records.

Upvotes: 3

Related Questions