user3807965
user3807965

Reputation: 63

Grouping by with SQLAlchemy

Let's say i have a table with articles and i want to get the list of the newest article per author, how would i do that?

so i have

class Article(Base):
    __tablename__ = "articles"
    id = Column(Integer, primary_key=True, index=True)
    text = Column(Text)
    headline = Column(Text)
    author_id = Column(Integer, ForeignKey('authors.id'))
    timestamp = Column(Integer)

and i want to get a list with one article per author (always the newest one)?

Somehow I can't figure out how to do this properly (i tried multiple ways but i didn't get anything satisfying). I'm using postgres.

Upvotes: 0

Views: 877

Answers (2)

snakecharmerb
snakecharmerb

Reputation: 55620

As shown in Michael Grogan's answer, you can do this by joining against a subquery that selects the author_id and their greatest timestamp. You can do this in the SQLAlchemy ORM like this:

import sqlalchemy as sa

...

# Create the subquery
subquery = (session.query(Article.author_id,
                          sa.func.max(Article.timestamp).label('mt')
                         )
                   .group_by(Article.author_id)
                   .subquery()
)

# Create the "outer" query
q = (session.query(Article)
            .join(subquery, 
                  sa.and_(Article.author_id == subquery.c.author_id, 
                          Article.timestamp == subquery.c.mt
                         )
                 )
)

for article in q:
    print(article)

Upvotes: 1

Michael Grogan
Michael Grogan

Reputation: 1026

It looks as though you are trying to accomplish two things:

  1. Use Python to connect to the SQL database through SQLAlchemy

  2. Obtain the name of the newest article by author, i.e. effectively grouping by author with the newest title displayed for each.

You can connect from Python to the SQL database using the example as below. Of course, replace credentials where necessary.

In terms of obtaining the newest titles, this is done through forming a subquery to get a list of the last date for each group, and then joining this subquery to the table.

Another answer provided on StackOverflow provides further information, and I have used this as the basis for constructing the query below, while replacing with the variable names relevant to your table.

This should hopefully yield what you're looking for, or at least point you in the right direction, i.e. you might need to substitute a variable depending on the data in the table, as one can only use the information you have provided as a baseline.

from sqlalchemy import create_engine
engine = create_engine('postgresql+psycopg2://postgres:password@localhost/articles')
df = pd.read_sql_query("SELECT t1.* FROM articles t1 JOIN (SELECT headline, MAX(timestamp) timestamp FROM articles GROUP BY author_id) t2 ON t1.author_id = t2.author_id AND t1.timestamp = t2.timestamp;", engine)

Upvotes: 0

Related Questions