Reputation: 3288
peewee allows bulk inserts via insert_many()
and insert_from()
, however insert_many()
allows a list of data to be inserted, but does not allow data computed from other parts of the database. insert_from()
does allow data computed from other parts of the database, but does not allow any data to be sent from python.
Assuming a model structure like so:
class BaseModel(Model):
class Meta:
database = db
class Person(BaseModel):
name = CharField(max_length=100, unique=True)
class StatusUpdate(BaseModel):
person = ForeignKeyField(Person, related_name='statuses')
status = TextField()
timestamp = DateTimeField(constraints=[SQL('DEFAULT CURRENT_TIMESTAMP')], index=True)
And some initial data:
Person.insert_many(rows=[{'name': 'Frank'}, {'name': 'Joe'}, {'name': 'Arnold'}]).execute()
print ('Person.select().count():',Person.select().count())
Output:
Person.select().count(): 3
Say we want to add a bunch new status updates, like the ones in this list:
new_status_updates = [ ('Frank', 'wat')
, ('Frank', 'nooo')
, ('Joe', 'noooo')
, ('Arnold', 'nooooo')]
We might try to use insert_many()
like so:
StatusUpdate.insert_many( rows=[{'person': 'Frank', 'status': 'wat'}
, {'person': 'Frank', 'status': 'nooo'}
, {'person': 'Joe', 'status': 'noooo'}
, {'person': 'Arnold', 'status': 'nooooo'}]).execute()
But this would fail: the person
field expects a Person
model or a Person.id
, and we would have to make an extra query to retrieve those from the names.
We might be able to avoid this with insert_from()
allows us to make subqueries, but insert_from()
has no way of processing our lists or dictionaries. What to do?
Upvotes: 1
Views: 1403
Reputation: 3288
One idea is to use the SQL VALUES
clause as part of a SELECT
statement.
If you are familiar with SQL, you may have seen the VALUES
clause before, it is commonly used as part of an INSERT
statement like so:
INSERT INTO statusupdate (person_id,status)
VALUES (1, 'my status'), (1, 'another status'), (2, 'his status');
This tells the database to insert three rows - AKA tuples - into the table statusupdate
.
Another way of inserting things though is to do something like:
INSERT INTO statusupdate (person_id,status)
SELECT ..., ... FROM <elsewhere or subquery>;
This is equivalent to the insert_from()
functionality that peewee provides.
But there is another less common thing you can do: you can use the VALUES
clause in any select to provide literal values. Example:
SELECT *
FROM (VALUES (1,2,3), (4,5,6)) as my_literal_values;
This will return a result-set of two rows/tuples, each with 3 values.
So, if you can convert the "bulk" insert into a SELECT/FROM/VALUES
statement, you can then do whatever transformations you need to do (namely, convert Person.name values to corresponding Person.id values) and then combine it with the peewee 'insert_from()` functionality.
So let us see how this would look.
First let us begin constructing the VALUES
clause itself. We want properly escaped values, so we will use question marks instead of the values for now, and put the actual values in later.
#this is gonna look like '(?,?), (?,?), (?,?)'
# or '(%s,%s), (%s,%s), (%s,%s)' depending on the database type
values_question_marks = ','.join(['(%s, %s)' % (db.interpolation,db.interpolation)]*len(new_status_updates))
The next step is to construct the values clause. Here is our first attempt:
--the %s here will be replaced by the question marks of the clause
--in postgres, you must have a name for every item in `FROM`
SELECT * FROM (VALUES %s) someanonymousname
OK, so now we have a result-set that looks like:
name | status
-----|-------
... | ...
Except! There are no column names. This will cause us a bit of heartache in a minute, so we have to figure out a way to give the result-set proper column names.
The postgres way would be to just alter the AS
clause:
SELECT * FROM (VALUES %s) someanonymousname(name,status)
sqlite3 does not support that (grr).
So we are reduced to a kludge. Luckily stackoverflow provides: Is it possible to select sql server data using column ordinal position, and we can construct something like this:
SELECT NULL as name, NULL as status WHERE 1=0
UNION ALL
SELECT * FROM (VALUES %s) someanonymousname
This works by first creating an empty result-set with the proper column-names, and then concatenating the result-set from the VALUES
clause to it. This will produce a result-set that has the proper column-names, will work in sqlite3, and in postgres.
Now to bring this back to peewee:
values_query = """
(
--a trick to make an empty query result with two named columns, to more portably name the resulting
--VALUES clause columns (grr sqlite)
SELECT NULL as name, NULL as status WHERE 1=0
UNION ALL
SELECT * FROM (VALUES %s) someanonymousname
)
"""
values_query %= (values_question_marks,)
#unroll the parameters into one large list
#this is gonna look like ['Frank', 'wat', 'Frank', 'nooo', 'Joe', 'noooo' ...]
values_query_params = [value for values in new_status_updates for value in values]
#turn it into peewee SQL
values_query = SQL(values_query,*values_query_params)
data_query = (Person
.select(Person.id, SQL('values_list.status').alias('status'))
.from_(Person,values_query.alias('values_list'))
.where(SQL('values_list.name') == Person.name))
insert_query = StatusUpdate.insert_from([StatusUpdate.person, StatusUpdate.status], data_query)
print (insert_query)
insert_query.execute()
print ('StatusUpdate.select().count():',StatusUpdate.select().count())
Output:
StatusUpdate.select().count(): 4
Upvotes: 1