Reputation: 577
i have a mysql sample data resulting like below table
main_cat| sub_cat | number | org_id
Career | school | 5 | A
Career | college | 3 | A
Career | higher | 4 | A
Job | Blr | 6 | A
Job | Hyd | 11 | A
Job | Chennai | 12 | A
Career | school | 15 | B
Career | college | 30 | B
Career | higher | 5 | B
Job | Blr | 5 | B
Career | college | 8 | C
Job | Chennai | 4 | C
I want to print top 2 main_cat
for every organization. For every top 2 main_cat
, I want to print top 2 sub_cat
for every organization. So every organization should have 4 or less records with top 2 main_cat
and top 2 sub_cat
for each main_cat
Please help me.
Upvotes: 1
Views: 45
Reputation: 69032
For grouping python provides itertools.groupby()
, which groups sorted input by a given key function.
In this case the input needs to be sorted first by org_id
, then by main_cat
then by number
in descending order, e.g. if your data as a list looks like this:
data = [
['Career', 'school', 5, 'A'],
['Career', 'college', 3, 'A'],
['Career', 'higher', 4, 'A'],
['Job', 'Blr', 6, 'A'],
['Job', 'Hyd', 11, 'A'],
['Job', 'Chennai', 12, 'A'],
['Career', 'school', 15, 'B'],
['Career', 'college', 30, 'B'],
['Career', 'higher', 5, 'B'],
['Job', 'Blr', 5, 'B'],
['Career', 'college', 8, 'C'],
['Job', 'Chennai', 4, 'C']
]
then you'd sort it like this:
data.sort(key = lambda x: (x[3], x[0], -x[2]))
or by altering your sql statement to include ORDER BY main_cat, sub_cat, number DESC
, then you'll aready get it in the right order from the database.
Now you can use groupby
to group, and islice
to limit the number of results per grouped category:
from itertools import groupby, islice
from operator import itemgetter
# already sorted data
data = [
['Career', 'school', 5, 'A'],
['Career', 'higher', 4, 'A'],
['Career', 'college', 3, 'A'],
['Job', 'Chennai', 12, 'A'],
['Job', 'Hyd', 11, 'A'],
['Job', 'Blr', 6, 'A'],
['Career', 'college', 30, 'B'],
['Career', 'school', 15, 'B'],
['Career', 'higher', 5, 'B'],
['Job', 'Blr', 5, 'B'],
['Career', 'college', 8, 'C'],
['Job', 'Chennai', 4, 'C']
]
data.sort(key = lambda x: (x[3], x[0], -x[2]))
for org, by_org in groupby(data, key=itemgetter(3)):
print("org:", org)
for cat, by_cat in islice(groupby(by_org, key=itemgetter(0)), 2):
print(" cat:", cat)
for subcat, by_subcat in islice(groupby(by_cat, key=itemgetter(1)), 2):
print(" subcat:", subcat, " = ", list(by_subcat))
Output:
org: A cat: Career subcat: school = [['Career', 'school', 5, 'A']] subcat: higher = [['Career', 'higher', 4, 'A']] cat: Job subcat: Chennai = [['Job', 'Chennai', 12, 'A']] subcat: Hyd = [['Job', 'Hyd', 11, 'A']] org: B cat: Career subcat: college = [['Career', 'college', 30, 'B']] subcat: school = [['Career', 'school', 15, 'B']] cat: Job subcat: Blr = [['Job', 'Blr', 5, 'B']] org: C cat: Career subcat: college = [['Career', 'college', 8, 'C']] cat: Job subcat: Chennai = [['Job', 'Chennai', 4, 'C']]
Upvotes: 1
Reputation: 27869
You can use pandas to process your query into dataframe with read_sql:
import pandas as pd
df = pd.read_sql(connection,query)
result = df.groupby(['org_id', 'main_cat', 'sub_cat'])['number'].head(2)
Variable connection
is your connection to db and query
is your SELECT
string.
Upvotes: 0