MohitC
MohitC

Reputation: 4791

group by and count in sqlalchemy

I have a below table:

conn.execute(hotel.insert(), [
   {'country':'India', 'city' : 'Pune', 'name': 'name1'},
   {'country':'India', 'city' : 'Mumbai', 'name': 'name2'},
   {'country':'India', 'city' : 'Nagpur', 'name': 'name3'},
   {'country':'US', 'city' : 'San Jose', 'name': 'name4'},
   {'country':'US', 'city' : 'San Francisco', 'name': 'name5'},
   {'country':'US', 'city' : 'San Mateo', 'name': 'name6'},
   {'country':'Brazil', 'city' : 'abc', 'name': 'name7'},
   {'country':'Brazil', 'city' : 'pqr', 'name': 'name8'},
   {'country':'Brazil', 'city' : 'xyz', 'name': 'name9'},
   {'country':'India', 'city' : 'Pune', 'name': 'name10'},
   {'country':'India', 'city' : 'Pune', 'name': 'name11'},
   {'country':'US', 'city' : 'San Jose', 'name': 'name12'},
   {'country':'Brazil', 'city' : 'abc', 'name': 'name13'},
])

I want to find out one city for each country that has the maximum number of entries.

I followed this example and got this output with below query: Query:

from sqlalchemy import create_engine, MetaData, Table, Column, Integer, String
from sqlalchemy import func
from sqlalchemy.orm import sessionmaker

engine = create_engine('sqlite:///hotel.db', echo = True)
meta = MetaData()

hotel = Table(
   'hotel', meta,
   Column('country', String),
   Column('city', String),
   Column('name', String),
)

Session = sessionmaker(bind = engine)
session = Session()
print(session.query(hotel.columns.city, func.count(hotel.columns.city)).group_by(hotel.columns.country).all())

Output:

[('abc', 4), ('Pune', 5), ('San Jose', 4)]

It is giving all the cities with maximum entries but the count being displayed is for number of entries for a country? Not sure how sqlalchemy works sorry if its naive.

Upvotes: 3

Views: 5312

Answers (1)

rfkortekaas
rfkortekaas

Reputation: 6474

You need to use a subquery to count the cities per country and than select the max from count of the subquery:

subq = select(hotel.c.country,
              hotel.c.city,
              func.count(hotel.c.city)
              .label('count'))\
              .group_by(hotel.c.country,
                        hotel.c.city)\
              .subquery()
stmt = select(subq.c.city,
              subq.c.count)\
              .having(func.max(subq.c.count))\
              .group_by(subq.c.country)
print(session.execute(stmt).all())

Upvotes: 2

Related Questions