Reputation: 5224
Say, I have the data like this:
col1 col2 other columns..
0 0 ...
0 0 ...
0 0 ...
0 0 ...
0 0 ...
0 0 ...
0 0 ...
0 0 ...
0 0 ...
0 0 ...
0 1 ...
0 1 ...
0 1 ...
0 1 ...
0 1 ...
0 1 ...
1 0 ...
1 0 ...
etc...
Data has been grouped by 2 columns (it's already result by grouping):
gr = df.groupby(['col1', 'col2']).size()
col1 col2
0 0 10
1 5
1 0 2
1 16
2 0 10
So now I need to figure out which percentage of each subgroup the count has respectively the whole group by 2 columns:
I need to add one more column, or transform to Series (better) to have a percentage of col2 respectively the group (col1) like:
col1 col2
0 0 0.66
1 0.33
1 0 0.1
1 0.9
2 0 1
Or it can be separate Series for each group: [0.66 0.1 1]
and [0.33 0.9]
.
How to implement it?
So let me describe the meaning of this data. For example it can be subjects (0,1,2), results (0 or 1) and amount students per subject per result. So the whole idea is to figure out what percentage of students failed/passed for subjects 0,1, etc.
One more additional thing - sometimes there is only one result (0 or 1) like the subject that all students passed, and I still need to be able to tell that for this subject percentage of 0 is 0,0 and of 1 is 1.
Upvotes: 4
Views: 3616
Reputation: 862661
You need groupby
by first level of index with sum
:
gr = df.groupby(['col1', 'col2']).size()
print (gr)
col1 col2
0 0 10
1 5
1 0 2
1 16
2 0 10
dtype: int64
print (gr.groupby(level=0).sum())
col1
0 15
1 18
2 10
dtype: int64
print (gr / gr.groupby(level=0).sum())
col1 col2
0 0 0.666667
1 0.333333
1 0 0.111111
1 0.888889
2 0 1.000000
dtype: float64
For storing Series
use dict comprehension:
dfs = {i:g.reset_index(drop=True) for i, g in g1.groupby(level=1)}
print (dfs[0])
0 0.666667
1 0.111111
2 1.000000
dtype: float64
print (dfs[1])
0 0.333333
1 0.888889
dtype: float64
Upvotes: 9
Reputation: 2150
You might be able to try this:
df = pd.DataFrame({'A':[0,1,0,1,0],'B':[10,5,2,16,10]}, index=[0,1,0,1,0])
df2 = df.ix[0] / df.ix[0].sum()
df3 = df.ix[1] / df.ix[1].sum()
Hope this will help.
Upvotes: 1