Reputation: 632
I'm using sqlalchemy to insert data into a table. The things I care about:
on_conflict_do_nothing
(I'm using postgres).As a minimal example, let's say I have two tables:
from sqlalchemy import Column, Integer, ForeignKey, String
Base = declarative_base()
class Address(Base):
__tablename__ = "addresses"
id = Column(Integer, primary_key=True)
street = Column(String)
number = Column(Integer)
class Person(Base)
__tablename__ = "people"
id = Column(Integer, primary_key=True)
name = Column(String)
address_id = Column(Integer, ForeignKey("addresses.id")
address = relationship("Address")
I can batch up all the Address
inserts easily enough with:
from sqlalchemy.dialects.postgresql import insert
values = [{"street": "main", "number": 1}, {"street": "main", "number": 2}]
statement = insert(Address).values(values).on_conflict_do_nothing()
session.execute(statement)
The question is how do I then do the same with the People
? I'm struggling with what to supply for "address"
, e.g.
values = [{"name": "John", "address": ????}]
statement = insert(Person).values(values).on_conflict_do_nothing()
session.execute(statement)
I'm assuming I need the Address
object but then I don't know where to go from there?
Upvotes: 1
Views: 6994
Reputation: 13129
Don't use address
, since it's not an SQL field. You can use INSERT...RETURNING to get the IDs of the addresses you just inserted, and then set values[0]['address_id']
based on that. Something like this should work:
from sqlalchemy.dialects.postgresql import insert
values = [{"street": "main", "number": 1}, {"street": "main", "number": 2}]
statement = insert(Address)\
.values(values)\
.returning(Address.id)\
.on_conflict_do_nothing()
address_ids = [address_id for address_id, in session.execute(statement)]
values = [{"name": "John", "address_id": address_ids[0]}]
statement = insert(Person).values(values).on_conflict_do_nothing()
session.execute(statement)
Explanation
With insert().on_conflict_do_nothing
, you're getting relatively close to the bare metal of PostgreSQL, compared to the added layers of SQLAlchemy. This is not a bad thing, it is of course where you have the most control and where the greatest speed ups happen, but it does mean that you need to think carefully about which fields are added by SQLAlchemy, and which fields are directly reflected by the database.
In your case, only id
, name
, and address_id
are actual database columns. If you execute SELECT * FROM people
, they are the only columns that should show up. In contrast, address
is added by SQLAlchemy to give you a direct link to the Address
instance with id address_id
, but under water, a different call to the database is performed to fetch it, if necessary.
If you have an Address
with id 1, then setting john.address = 1
doesn't work, because 1 is not an address, it's a number. However, setting john.address_id = 1
, does work, because you're setting a field directly. SQLAlchemy discourages this if you're using the full ORM (which you're not), because you do this outside of their ecosystem. If you run john.address = Address.query.get(1)
, you assign the entire Address object to address, and SQLAlchemy updates address_id
to 1 internally.
Upvotes: 4