Reputation: 1294
Using this data
import pandas as pd
df=pd.read_excel(
"https://github.com/chris1610/pbpython/blob/master/data/sample-salesv3.xlsx?raw=True"
)
df["date"] = pd.to_datetime(df['date']) df.head()
I grouped ext price
with name
by month using this code
df.groupby([pd.Grouper(key='date', freq='M'), 'name'])['ext price'].sum()
Out :
date name
2014-01-31 Barton LLC 6177.57
Cronin, Oberbrunner and Spencer 1141.75
Frami, Hills and Schmidt 5112.34
Fritsch, Russel and Anderson 15130.77
Halvorson, Crona and Champlin 9997.17
Herman LLC 10749.84
Jerde-Hilpert 11274.33
Kassulke, Ondricka and Metz 7322.83
Keeling LLC 6847.86
Kiehn-Spinka 8097.50
Koepp Ltd 10768.33
Kuhn-Gusikowski 7309.54
Kulas Inc 15398.87
Pollich LLC 1004.22
Purdy-Kunde 4689.37
Sanford and Sons 9544.13
Stokes LLC 5809.34
Trantow-Barrows 14328.26
White-Trantow 13703.77
Will LLC 20953.87
2014-02-28 Barton LLC 12218.03
Cronin, Oberbrunner and Spencer 13976.26
Frami, Hills and Schmidt 4124.53
Fritsch, Russel and Anderson 9595.35
Halvorson, Crona and Champlin 7082.15
Herman LLC 5831.40
Jerde-Hilpert 4088.40
Kassulke, Ondricka and Metz 3061.12
Keeling LLC 3383.45
Kiehn-Spinka 3461.12
...
2014-11-30 Koepp Ltd 4882.27
Kuhn-Gusikowski 7197.89
Kulas Inc 4149.34
Pollich LLC 6334.21
Purdy-Kunde 2376.00
Sanford and Sons 6834.04
Stokes LLC 6158.81
Trantow-Barrows 6550.10
White-Trantow 9544.61
Will LLC 3210.44
2014-12-31 Barton LLC 2772.90
Cronin, Oberbrunner and Spencer 7640.60
Frami, Hills and Schmidt 16249.81
Fritsch, Russel and Anderson 12345.64
Halvorson, Crona and Champlin 2900.51
Herman LLC 4664.54
Jerde-Hilpert 6941.99
Kassulke, Ondricka and Metz 4425.22
Keeling LLC 13247.88
Kiehn-Spinka 17401.28
Koepp Ltd 11791.00
Kuhn-Gusikowski 4959.85
Kulas Inc 6106.38
Pollich LLC 12357.76
Purdy-Kunde 4051.79
Sanford and Sons 2151.48
Stokes LLC 6366.26
Trantow-Barrows 10124.23
White-Trantow 4806.93
Will LLC 12561.21
Name: ext price, Length: 240, dtype: float64
Now, I'm trying to get the top 5 name
(top ext price
) for each month
I've tried nlargest(5)
but its not working
Function head(5)
does not solve the probleme also
Upvotes: 1
Views: 88
Reputation: 4743
I believe this is what you are looking for:
df = df.groupby([pd.Grouper(key="date", freq="M"), "name"])["ext price"].sum()
out = df.groupby(level=0).nlargest(5)
out.index = out.index.droplevel()
Then you get following data:
In [134]: out.head(10)
Out[134]:
date name
2014-01-31 Will LLC 20953.87
Kulas Inc 15398.87
Fritsch, Russel and Anderson 15130.77
Trantow-Barrows 14328.26
White-Trantow 13703.77
2014-02-28 Cronin, Oberbrunner and Spencer 13976.26
Kulas Inc 13973.65
Will LLC 13613.06
Barton LLC 12218.03
White-Trantow 11783.98
dtype: float64
Upvotes: 0
Reputation: 14103
Another option is nlargest
but probably not faster than James' suggestion as sorting and getting the head
or tail
should be faster than nlargest
:
new = df.groupby([pd.Grouper(key='date', freq='M'), 'name'])['ext price'].sum()
new.groupby(level=0).nlargest(5).sort_index().reset_index(level=1, drop=True).to_frame()
Upvotes: 1
Reputation: 36658
There might be a quicker way to do this, but you can sort by the price, group by the date, grab the top 5, then sort the index (which is also the date).
s = df.groupby([pd.Grouper(key='date', freq='M'), 'name'])['ext price'].sum()
out = (s.reset_index(level=-1)
.sort_values('ext price', ascending=False)
.groupby('date')
.head(5)
.sort_index()
)
out
# returns:
name ext price
date
2014-01-31 Will LLC 20953.87
2014-01-31 White-Trantow 13703.77
2014-01-31 Trantow-Barrows 14328.26
2014-01-31 Fritsch, Russel and Anderson 15130.77
2014-01-31 Kulas Inc 15398.87
2014-02-28 Will LLC 13613.06
2014-02-28 Kulas Inc 13973.65
2014-02-28 Cronin, Oberbrunner and Spencer 13976.26
2014-02-28 White-Trantow 11783.98
2014-02-28 Barton LLC 12218.03
...
2014-10-31 White-Trantow 19896.85
2014-10-31 Kulas Inc 15754.28
2014-10-31 Kuhn-Gusikowski 10559.77
2014-10-31 Jerde-Hilpert 11401.27
2014-10-31 Fritsch, Russel and Anderson 10686.18
2014-11-30 Keeling LLC 10310.96
2014-11-30 White-Trantow 9544.61
2014-11-30 Kassulke, Ondricka and Metz 8836.82
2014-11-30 Kuhn-Gusikowski 7197.89
2014-11-30 Frami, Hills and Schmidt 7221.02
2014-12-31 Frami, Hills and Schmidt 16249.81
2014-12-31 Pollich LLC 12357.76
2014-12-31 Will LLC 12561.21
2014-12-31 Kiehn-Spinka 17401.28
2014-12-31 Keeling LLC 13247.88
Upvotes: 1
Reputation: 2843
You can use the idxmax
function for this:
df[df.groupby('month')['ext price'].idxmax()]
Upvotes: 0