Reputation: 7099
I have database with thousands of records
Code | Name | Price
00106 | Water | 9.99
00107 | Onion | 8.99
Which is coded in GES
file like below:
00F
means column header00I
means insert a rowthere are also others like(00D
for delete row or 00U
for update)
00F
0101
02Code
031
00F
0102
02Name
031
00F
0103
02Price
030
00I
0100106
02Water
030999
00I
0100107
02Onion
030899
I want to create importer which process this file and push it into my database. So I started implemented that:
class Importer
CONN = ActiveRecord::Base.connection
F = "00F"
I = "00I"
def extract_to_database(collection)
add = true
tmp = []
type = F
inserts = []
collection.each_with_index do |line, i|
_type = line.strip
_changed = [F,I].include? _type
if _changed && i > 0
case type
when F then @f << tmp
when I
group_id = Group.find_by(code: tmp[1]).id
inserts.push "(group_id,'#{tmp[2]}','#{tmp[3]}')"
end
tmp = []
type = _type
end
tmp << line
end
sql = "INSERT INTO products (`group_id`, `name`, `price`) VALUES #{inserts.join(", ")}"
CONN.execute sql
end
end
There is one problem with that, I would like to refactor that using functional programming.
And I will have to find other model by code
and put to products
table related some_model_id
column so this can complicate whole process. Because right now importing this data takes me few hours.
Maybe using Ruby is not the best option.
Upvotes: 1
Views: 156
Reputation: 211610
There's nothing here that Ruby can't handle. It's not clear how "functional programming" would help this any, either, as this is a classic state-machine sort of problem with some simple data transformation going on.
Example scaffold:
class SomethingImporter
FIELD_MARKER = "00F"
INSERT_MARKER = "00I"
COLUMNS = %w[ group_id name price ]
# Performs the insert into a given model. This should probably be a class
# method on the model itself.
def bulk_insert(model, rows)
sql = [
"INSERT INTO `#{model.table_name}` (#{columns.collect { |c| }}"
]
# Append the placeholders: (?,?,?),(?,?,?),...
sql[0] += ([ '(%s)' % ([ '?' ] * COLUMNS.length).join(',') ] * rows.length).join(',')
sql += rows.flatten
model.connection.execute(model.send(:sanitize_sql, sql))
end
# Resolve a group code to a group_id value, and cache the result so that
# subsequent look-ups for the same code are valid.
def group_id(group_code)
@find_group ||= { }
# This tests if any value has been cached for this code, including one
# that might be nil.
if (@find_group.key?(group_code))
return @find_group[group_code]
end
group = Group.find_by(code: group_code)
@find_group[group_code] = group && group.id
end
# Call this with the actual collection, lines stripped, and with any header
# lines removed (e.g. collection.shift)
def extract_rows(collection)
state = nil
rows = [ ]
row = [ ]
collection.each_with_index do |line|
case (line)
when FIELD_MARKER
# Indicates field data to follow
state = :field
when INSERT_MARKER
case (state)
when :insert
rows << [ row[0], row[1], (row[2].sub(/^0+/, '').to_f / 100) ]
end
state = :insert
row = [ ]
else
case (state)
when :field
# Presumably you'd pay attention to the data here and establish
# a mapping table.
when :insert
row << line.sub(/^\d\d/, '')
# puts row.inspect
end
end
end
case (state)
when :insert
rows << [ row[0], row[1], (row[2].sub(/^0+/, '').to_f / 100) ]
end
rows
end
end
data = <<END
00F
0101
02Code
031
00F
0102
02Name
031
00F
0103
02Price
030
00I
0100106
02Water
030999
00I
0100107
02Onion
030899
END
importer = SomethingImporter.new
puts importer.extract_rows(data.split(/\n/)).inspect
The example output from this, based on your data, looks like:
[["00106", "Water", 9.99], ["00107", "Onion", 8.99]]
When writing code like this, be sure to expose intermediate results in order to be able to test what's happening. Your implementation takes data and dumps it directly in the database in one shot, making it very hard to tell where something's going wrong if it doesn't work out properly. This version is composed of several methods, each of which has a more specific purpose.
It's not clear in your original example why you're resolving group_id
at all, your sample output has nothing to do with that, but as an example I've included a method that resolves them and keeps them cached, avoiding repeated lookups of the same thing. For a larger scale import you'd probably load in many rows, extract out the distinct group_id values, load them all at once, and remap them before inserting.
Upvotes: 2