AKN
AKN

Reputation: 196

Ruby - Manipulate CSV data, insert into database

I'm using a rake task requiring Ruby's CSV class to import rows of property data, and would like to manipulate this data before inserting it into the database.

CSV

PID,City,Address,Sold Date,Sold Price
100-200-300,Vancouver,510 1700 Nelson Street,01/01/2017,"$500,000 "
200-300-400,Vancouver,304 68 Smithe Street,02/02/2017,"600,000"

Residence Table (shortened for brevity)

+-----+------+------+---------------+-------------+
| pid | city | unit | street_number | street_name |
+-----+------+------+---------------+-------------+
|     |      |      |               |             |
+-----+------+------+---------------+-------------+

Rake Task (what I have so far)

require 'csv'

desc 'Upload CSV data into database'
task residences: :environment do
  residences = Array.new
  counter    = 0
  csv_file   = "#{Rails.root}/public/spreadsheets/unformatted-addresses.csv"

  CSV.foreach(csv_file, headers: true, header_converters: :symbol, converters: :all, skip_blanks: true, encoding: 'UTF-8') do |row|

    #is this the right place to create the hash?
    residences << row.to_hash

    #is this the right way to format each cell?
    residences[counter][:pid]
    residences[counter][:city].downcase
    residences[counter][:address].downcase.split(" ")
    residences[counter][:sold_date]
    residences[counter][:sold_price].delete('$ ,').to_i

    Residence.create( #what to put here? )

    counter += 1
  end

  puts "Imported #{counter} rows."
end

What I would like to achieve is to individually format the cell contents then insert into appropriate columns, for example the address format should be:

"Unit", "Street Number", "Street Name"

Would greatly appreciate any help with this!

Upvotes: 0

Views: 119

Answers (3)

AKN
AKN

Reputation: 196

The final result is below.

require 'csv'
require 'time'

namespace :csv do
  desc 'Upload CSV data into database'
  task residences: :environment do
    residences    = []
    counter       = 0
    csv_file      = "#{Rails.root}/public/spreadsheets/unformatted-addresses.csv"
    address_regex = /^(\d+[a-z]?)+\s+(\d+)+\s+(.+(?=\W))+\s+(.*)/i


    CSV.foreach(csv_file, headers: true, header_converters: :symbol, converters: :all, skip_blanks: true, encoding: 'UTF-8') do |row|
        address       = address_regex.match(row[:address])
        unit          = address[1]
        street_number = address[2]
        street_name   = address[3]
        street_type   = address[4]
        pid           = row[:pid].strip
        city          = row[:city].strip.downcase
        date          = Date.parse(row[:sold_date])
        sold_date     = date.strftime("%m-%d-%Y")
        sold_price    = row[:sold_price].strip.delete('$ ,').to_i

        puts "#{address}, #{pid}, #{city}, #{sold_date}, #{sold_price}"

        Residence.create(
          pid:           pid,
          city:          city,
          unit:          unit,
          street_number: street_number,
          street_name:   street_name,
          street_type:   street_type,
          sold_date:     sold_date,
          sold_price:    sold_price
        )

        counter += 1
    end

    puts "Imported #{counter} rows."
  end
end

Upvotes: 1

DivXZero
DivXZero

Reputation: 611

Adding onto my previous answer, you should be able to do something like this:

require 'csv'

address_regex = /(^\d+[a-z]?)+\s+(\d+)+\s+(.*)/i

desc 'Upload CSV data into database'
task residences: :environment do
  counter    = 0
  csv_file   = "#{Rails.root}/public/spreadsheets/unformatted-addresses.csv"

  CSV.foreach(csv_file, headers: true, header_converters: :symbol, converters: :all, skip_blanks: true, encoding: 'UTF-8') do |row|

    address = address_regex.match(row[:address])

    Residence.create(
      pid:           row[:pid],
      city:          row[:city],
      unit:          address[1],
      street_number: address[2],
      street_name:   address[3]
    )

    counter += 1
  end

  puts "Imported #{counter} rows."
end

Upvotes: 2

DivXZero
DivXZero

Reputation: 611

This should work for what you're trying to do, assuming that every address is going to have a unit (it will also include any units with a character such as '12A':

address_regex = /(^\d+[a-z]?)+\s+(\d+)+\s+(.*)/i

matches = address_regex.match(residences[counter][:address])

unit          = matches[1]
street_number = matches[2]
street_name   = matches[3]

Codepad Example

Note this isn't the most efficient code, I'm just aiming for clarity

Upvotes: 0

Related Questions