Reputation: 11
I have the following dataset.I want to find quarter of which year has generated the highest number of install for each app used in the study?
Installs CR Month Year Category
0 10000 Everyone January 2018 ART_AND_DESIGN
1 500000 Everyone January 2018 ART_AND_DESIGN
2 5000000 Everyone August 2018 ART_AND_DESIGN
3 50000000 Teen June 2018 ART_AND_DESIGN
4 100000 Everyone June 2018 ART_AND_DESIGN
... ... ... ... ...
10836 5000 Everyone July 2017 FAMILY
10837 100 Everyone July 2018 FAMILY
10838 1000 Everyone January 2017 MEDICAL
10839 1000 Mature 17+ January 2015 BOOKS_AND_REFERENCE
10840 10000000 Everyone July 2018 LIFESTYLE
Upvotes: 1
Views: 43
Reputation: 862901
If need max values per quarters and Category
use:
q = (pd.to_datetime(df['Month'] + df['Year'].astype(str), format='%B%Y')
.dt.to_period('Q').rename('Quarter'))
df = df.groupby([q,'Category'])['Installs'].max().reset_index()
print (df)
Quarter Category Installs
0 2015Q1 BOOKS_AND_REFERENCE 1000
1 2017Q1 MEDICAL 1000
2 2017Q3 FAMILY 5000
3 2018Q1 ART_AND_DESIGN 500000
4 2018Q2 ART_AND_DESIGN 50000000
5 2018Q3 ART_AND_DESIGN 5000000
6 2018Q3 FAMILY 100
Or if need aggregate Installs
per quarters and categories and get queraters of maximum Installs
then use:
q = (pd.to_datetime(df['Month'] + df['Year'].astype(str), format='%B%Y')
.dt.to_period('Q').rename('Quarter'))
df1 = df.groupby([q,'Category'])['Installs'].sum().reset_index()
print (df1)
Quarter Category Installs
0 2015Q1 BOOKS_AND_REFERENCE 1000
1 2017Q1 MEDICAL 1000
2 2017Q3 FAMILY 5000
3 2018Q1 ART_AND_DESIGN 510000
4 2018Q2 ART_AND_DESIGN 50100000
5 2018Q3 ART_AND_DESIGN 5000000
6 2018Q3 FAMILY 100
7 2018Q3 LIFESTYLE 10000000
df2 = df1.loc[df1.groupby('Category')['Installs'].idxmax()]
print (df2)
Quarter Category Installs
4 2018Q2 ART_AND_DESIGN 50100000
0 2015Q1 BOOKS_AND_REFERENCE 1000
2 2017Q3 FAMILY 5000
7 2018Q3 LIFESTYLE 10000000
1 2017Q1 MEDICAL 1000
Upvotes: 2