Reputation: 6321
I have a MySQL database with 2 tables products
and product_variants
. A product has many product variants. Here a sample:
products
+----+------+
| id | name |
+----+------+
| 1 | Foo |
| 2 | Bar |
+----+------+
product_variants
+----+-------------+--------+
| id | product_id | value |
+----+-------------+--------+
| 10 | 1 | red |
| 11 | 1 | green |
| 12 | 1 | blue |
| 13 | 2 | red |
| 14 | 2 | yellow |
+----+-------------+--------+
Now I need to bulk insert a lot of products with their variants in the most efficient and fastest way. I have a JSON with many products (100k+) like this:
[
{
"name": "Foo",
"variants": [{ "value": "red" }, { "value": "green" }, { "value": "blue" }]
},
{
"name": "Bar",
"variants": [{ "value": "red" }, { "value": "yellow" }]
},
...
]
from which I should generate a query to insert the products.
My idea is to use an insert
query like this:
INSERT INTO `products` (name) VALUES ("foo"), ("bar"), ...;
But then I don't know what product_id
(foreign key) to use in the insert query for product_variants
:
INSERT INTO `product_variants` (product_id,value) VALUES (?,"red"), (?,"green"), ...;
(these queries inside a transaction)
I've thought to specify the product ids manually, in increment way from the last id, but I get errors when concurrency connection insert products at the same time or when 2 or more bulk insert processes run concurrently.
What strategy can I use to achieve my goal? Is there a standard way to do this?
ps: if possible I would not want to change the structure of the 2 tables.
Upvotes: 4
Views: 2885
Reputation: 6321
Finally, I've used a strategy that uses the MySQL function LAST_INSERT_ID()
like @sticky-bit sad but using bulk insert (1 insert for many products) that is much faster.
I attach a simple Ruby script to perform bulk insertions. All seems works well also with concurrency insertions.
I've run the script with the flag innodb_autoinc_lock_mode = 2
and all seems good, but I don't know if is necessary to set the flag to 1:
require 'active_record'
require 'benchmark'
require 'mysql2'
require 'securerandom'
ActiveRecord::Base.establish_connection(
adapter: 'mysql2',
host: 'localhost',
username: 'root',
database: 'test',
pool: 200
)
class ApplicationRecord < ActiveRecord::Base
self.abstract_class = true
end
class Product < ApplicationRecord
has_many :product_variants
end
class ProductVariant < ApplicationRecord
belongs_to :product
COLORS = %w[red blue green yellow pink orange].freeze
end
def migrate
ActiveRecord::Schema.define do
create_table(:products) do |t|
t.string :name
end
create_table(:product_variants) do |t|
t.references :product, null: false, foreign_key: true
t.string :color
end
end
end
def generate_data
d = []
100_000.times do
d << {
name: SecureRandom.alphanumeric(8),
product_variants: Array.new(rand(1..3)).map do
{ color: ProductVariant::COLORS.sample }
end
}
end
d
end
DATA = generate_data.freeze
def bulk_insert
# All inside a transaction
ActiveRecord::Base.transaction do
# Insert products
values = DATA.map { |row| "('#{row[:name]}')" }.join(',')
q = "INSERT INTO products (name) VALUES #{values}"
ActiveRecord::Base.connection.execute(q)
# Get last insert id
q = 'SELECT LAST_INSERT_ID()'
last_id, = ActiveRecord::Base.connection.execute(q).first
# Insert product variants
i = -1
values = DATA.map do |row|
i += 1
row[:product_variants].map { |subrow| "(#{last_id + i},'#{subrow[:color]}')" }
end.flatten.join(',')
q = "INSERT INTO product_variants (product_id,color) VALUES #{values}"
ActiveRecord::Base.connection.execute(q)
end
end
migrate
threads = []
# Spawn 100 threads that perform 200 single inserts each
100.times do
threads << Thread.new do
200.times do
Product.create(name: 'CONCURRENCY NOISE')
end
end
end
threads << Thread.new do
Benchmark.bm do |benchmark|
benchmark.report('Bulk') do
bulk_insert
end
end
end
threads.map(&:join)
After running the script I've checked that all products have associated variants with the query
SELECT *
FROM products
LEFT OUTER JOIN product_variants
ON (products.id = product_variants.product_id)
WHERE product_variants.product_id IS NULL
AND name != "CONCURRENCY NOISE";
and correctly I get no rows.
Upvotes: 0
Reputation: 142298
If you already have the JSON in a table, then it can probably be done (quite efficiently) with two statements:
INSERT INTO Products (name)
SELECT name
FROM origial_table; -- to get the product names
INSERT INTO Variants (product_id, `value`)
SELECT ( SELECT id FROM Products WHERE name = ot.name ),
`value`
FROM origial_table AS ot;
In reality, name
and value
would need to be suitable JSON expressions to extract the values.
If you are concerned about lots of duplicate "products" in the first table, be sure to have UNIQUE(name)
. And you can avoid "burning" ids by the 2-step process described here: mysql.rjweb.org/doc.php/staging_table#normalization
Upvotes: 0
Reputation: 37472
You can use the last_insert_id()
to get the last generated ID from the last statement. But since this, as mentioned, only gets the last ID of the statement, requires that you handle each product individually. You can bulk insert the variants though. But from the structure of the given JSON I'd think that makes it even easier to traverse that JSON. Each product and it's variant should be inserted in an transaction so that the variants of a product don't get added to the previous product if the INSERT
into the product table fails for some reason.
START TRANSACTION;
INSERT INTO products
(name)
VALUES ('Foo');
INSERT INTO product_variants
(product_id,
value)
VALUES (last_insert_id(),
'red'),
(last_insert_id(),
'green'),
(last_insert_id(),
'blue');
COMMIT;
START TRANSACTION;
INSERT INTO products
(name)
VALUES ('Bar');
INSERT INTO product_variants
(product_id,
value)
VALUES (last_insert_id(),
'red'),
(last_insert_id(),
'yellow');
COMMIT;
Upvotes: 2