Reputation: 5574
Lets say I have a dataframe
df=pd.DataFrame({'Location': [ 'Ala', 'SS', 'Ala', 'Ala', 'SS', 'Ala', 'SS', 'TXE', 'TXE', 'TXE'],
'Bid': ['E','N','E','N','N','E', 'E',np.nan,np.nan,'A']})
Where S is sealed bids, N is people who did not bid, Nan is not present, and O is open bid.
I want to do a calculate the percentage of bidders where the equation would be (E+A)/(E+A+N). Is the best way to do a pivot table then implement the equation?
df=pd.DataFrame({'Location': [ 'Ala', 'SS', 'Ala', 'Ala', 'SS', 'Ala', 'SS', 'TXE', 'TXE', 'TXE'],
'Bid': ['E','N','E','N','N','E', 'E',np.nan,np.nan,'A']})
pt = df.pivot_table(rows='Location', cols='Bid', aggfunc='size', fill_value=0)
pt['Percentage']=(pt.A + pt.E)/(pt.A+pt.E+pt.N)
print (pt)
>>>
Bid A E N Percentage
Location
Ala 0 3 1 0.750000
SS 0 1 2 0.333333
TXE 1 0 0 1.000000
[3 rows x 4 columns]
Is this the best way to calculate percentage or is there a better way than pivot tables?
Upvotes: 0
Views: 8138
Reputation: 4328
Perhaps this isn't general enough but you can get the percentages with
counts = df3['Bid'].value_counts(normalize=True)
Then finding (E+A)
as a percentage of all bids is as simple as
counts.E + counts.A
If you don't want to include NaN
bids in the percentage calculation then
counts = df3['Bid'].dropna().value_counts(normalize=True)
and, if there are other bid types you need to exclude
all_allowable = df3['Bid'].isin(['E', 'A', 'N'])
counts = df3[all_allowable]['Bid'].value_counts(normalize=True)
To split by location
all_allowable = df3['Bid'].isin(['E', 'A', 'N'])
df3[all_allowable].groupby('Location')['Bid'].value_counts(normalize=True)
Upvotes: 8
Reputation: 78803
Your answer looks pretty good to me. It's very readable, which is obviously important.
If you want an alternative, you could look at groupby
, but, as I said, I think your own answer looks great:
>>> df=pd.DataFrame({'Location': [ 'Ala', 'SS', 'Ala', 'Ala', 'SS', 'Ala', 'SS', 'TXE', 'TXE', 'TXE'],
... 'Bid': ['E','N','E','N','N','E', 'E',np.nan,np.nan,'A']})
>>> df = df.set_index('Location')
>>> ean = df.groupby(level='Location').count()
>>> ea = df[df != 'N'].groupby(level='Location').count()
>>> ea.astype(float) / ean
Bid
Location
Ala 0.750000
SS 0.333333
TXE 1.000000
Upvotes: 0