Wichert Akkerman
Wichert Akkerman

Reputation: 5318

How do you create a query returning an array from a subquery?

I have a schema which contains a list of addresses for companies. For a report I need to do some filtering for those companies and includes a list of all regions they are in. The relevant core part of the query I am trying to write is:

SELECT name, ARRAY(SELECT DISTINCT state
                   FROM location
                   WHERE location.foo_id=foo.id)
FROM foo;

The base code I am using to test this is pretty simple (note that I am using PostgreSQL as my database):

import sqlalchemy as sa
from sqlalchemy.sql import distinct, func
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship, sessionmaker


Base = declarative_base()


class Company(Base):
    __tablename__ = 'company'
    id = sa.Column(sa.Integer, sa.Sequence('company_id_seq'), primary_key=True)
    name = sa.Column(sa.Unicode)


class Location(Base):
    __tablename__ = 'location'
    id = sa.Column(sa.Integer, sa.Sequence('location_id_seq'), primary_key=True)
    company_id = sa.Column(sa.Integer, sa.ForeignKey(Company.id))
    company = relationship(Company, backref='locations')
    state = sa.Column(sa.Unicode, nullable=False)

engine = sa.create_engine('postgresql:///pytest', echo=False)
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)

session = Session()
for (name, states) in [
        ('Acme', ['Alabama', 'Georgia']),
        ('Example Inc', ['Florida', 'Florida']),
        ('Empty', [])]:
    session.add(Company(
        name=name,
        locations=[Location(state=s) for s in states]))
session.flush()

# Desired result:
#
#  [('Acme', ['Alabama', 'Georgia']),
#   ('Example Inc', ['Florida']),
#   ('Empty', [])]

states_per_foo = session.query(distinct(Location.state))\
    .filter(Location.company_id == Company.id)
search_query = session.query(Company, func.array(states_per_foo))
print(search_query.all())

All permutations I have tried so far fail with various SQLAlchemy errors or invalid SQL being generated.

Upvotes: 1

Views: 6441

Answers (2)

van
van

Reputation: 76962

Below code:

sq = (
    session
    .query(Location.state.distinct())
    .filter(Location.company_id == Company.id)
    .correlate(Company)
    .as_scalar()
)
q = session .query(Company.name, func.array(sq).label("states"))

produces exactly (ignoring some extra parenthesis) the SQL query you are trying to write:

SELECT company.name,
       array(
               (SELECT DISTINCT location.state AS anon_1
                FROM location
                WHERE location.company_id = company.id)) AS states
FROM company

and the result is:

('Acme', ['Georgia', 'Alabama'])
('Example Inc', ['Florida'])
('Empty', [])

Original version: Slight modification: added .distinct() in order to remove the duplicates.

Not the SQL query you are looking for, but the result of the below query:

q = (
    session
    .query(Company.name, func.array_agg(Location.state.distinct()))
    .outerjoin(Location)
    .group_by(Company.name)
)
for x in q.all():
    print(x)

is (sorted by Company.name because of grouping):

('Acme', ['Alabama', 'Georgia'])
('Empty', [None])
('Example Inc', ['Florida'])

Note the [None] instead of [] for Empty. This can be handled in a special way as well.

Upvotes: 3

Eugene Lisitsky
Eugene Lisitsky

Reputation: 12845

import sqlalchemy as sa
engine = sa.create_engine('postgresql://...')
result = engine.execute('''SELECT name, 
                               ARRAY(SELECT DISTINCT state
                                     FROM location
                                     WHERE location.foo_id=foo.id)
                           FROM foo''')
rows = result.fetchall()
for row in rows:
   print(type(row[0]))  # <class 'str'>
   print(row[0])        # Acme
   print(type(row[1]))  # <class 'list'>
   print(row[1])        # ['Alabama', 'Georgia']

Upvotes: 0

Related Questions