Reputation: 22041
Season Name value
2001 arkansas 3.497
2002 arkansas 3.0935
2003 arkansas 3.3625
2015 arkansas 3.766
2001 colorado 2.21925
2002 colorado 1.4795
2010 colorado 2.89175
2011 colorado 2.48825
2012 colorado 2.08475
2013 colorado 1.68125
2014 colorado 2.5555
2015 colorado 2.48825
In the dataframe above, I want to identify top and bottom 10 percentile values in column value
for each state (arkansas and colorado). How do I do that? I can identify top and bottom percentile for entire value
column like so:
np.searchsorted(np.percentile(a, [10, 90]), a))
Upvotes: 1
Views: 6325
Reputation: 1467
If the variable for your dataframe is df, this should work. I'm not sure what you want your output to look like, but I just created code for a dictionary, where each key is a state. Also, since you have very few values, I used the option "nearest" for the argument interpolation (the default value is interpolation). To see the possible options, check out the documentation for the function here.
import pandas as pd
import numpy as np
df = pd.read_csv('stacktest.csv')
#array of unique state names from the dataframe
states = np.unique(df['Name'])
#empty dictionary
state_data = dict()
for state in states:
state_data[state] = np.percentile(df[df['Name'] == state]['value'],[10,90],interpolation = 'nearest')
print(state_data)
Upvotes: 0
Reputation: 403218
You can use groupby
+ quantile
:
df.groupby('Name')['value'].quantile([.1, .9])
Name
arkansas 0.1 3.174200
0.9 3.685300
colorado 0.1 1.620725
0.9 2.656375
Name: value, dtype: float64
And then call np.searchsorted
.
Alternatively, use qcut
.
df.groupby('Name').apply(lambda x:
pd.qcut(x['value'], [.1, .9]))
Name
arkansas 0 (3.173, 3.685]
1 NaN
2 (3.173, 3.685]
3 NaN
colorado 4 (1.62, 2.656]
5 NaN
6 NaN
7 (1.62, 2.656]
8 (1.62, 2.656]
9 (1.62, 2.656]
10 (1.62, 2.656]
11 (1.62, 2.656]
Name: value, dtype: object
Upvotes: 3