Reputation:
I want to count the number of times either of the specific values 4 & 5 appear in each column of a pandas DataFrame, proportionately.
Given this dataframe as input:
| A | B | C | D | E |
|---|---|---|---|---|
| 3 | 3 | 1 | 2 | 1 |
| 5 | 5 | 4 | 4 | 3 |
| 3 | 4 | 1 | 1 | 5 |
...first get raw counts per column, something like this...:
| A | B | C | D | E |
|---|---|---|---|---|
| 1 | 2 | 1 | 1 | 1 |
...then I want to divide the counts for each column, to get proportionate counts:
Desired Output:
| A | B | C | D | E |
|------|------|------|------|------|
| 0.33 | 0.66 | 0.33 | 0.66 | 0.33 |
I tried the following
df.counts()
but this only returns me the total no. of counts
Upvotes: 2
Views: 1628
Reputation: 862611
Use DataFrame.isin
for get mask, for count values use sum
and for ratio use mean
, last for one row DataFrame add Series.to_frame
with transpose:
mask = df.isin([4,5])
df1 = mask.sum().to_frame().T
print (df1)
A B C D E
0 1 2 1 1 1
df2 = mask.mean().to_frame().T
#or division
#df2 = df1.div(len(df))
print (df2)
A B C D E
0 0.333333 0.666667 0.333333 0.333333 0.333333
EDIT: Solution with groupby by Head Manager
is by create index by Head Manager
, then test membership and count sum
or mean
:
print (df)
Head Manager A B C D E
0 man1 3 3 1 2 1
1 man1 5 5 4 4 3
2 man1 3 4 1 1 5
df11 = df.set_index('Head Manager').isin([4,5]).sum(level=0)
#alternative
#df11 = df.set_index('Head Manager').isin([4,5]).groupby(level=0).sum()
print (df11)
A B C D E
Head Manager
man1 1.0 2.0 1.0 1.0 1.0
df22 = df.set_index('Head Manager').isin([4,5]).mean(level=0)
#alternative
#df22 = df.set_index('Head Manager').isin([4,5]).groupby(level=0).mean()
print (df22)
A B C D E
Head Manager
man1 0.333333 0.666667 0.333333 0.333333 0.333333
Alternative with groupby.apply
, but is is slow if many groups:
cols = df.columns.difference(['Head Manager'])
df22 = df.groupby('Head Manager')[cols].apply(lambda x: x.isin([3,4]).mean())
print (df22)
A B C D E
Head Manager
man1 0.666667 0.666667 0.333333 0.333333 0.333333
If want specify some columns:
df22 = df.groupby('Head Manager')['A','B','C','D','E'].apply(lambda x: x.isin([3,4]).mean())
Upvotes: 3
Reputation: 26159
This will get you the occurrence of each number:
>>> pd.DataFrame([df[col].value_counts()/len(df) for col in df.columns])[[4,5]].T
A B C D
4.0 NaN 0.333333 NaN NaN
5.0 NaN NaN NaN 0.333333
Upvotes: 0
Reputation: 92854
Straightforwardly:
In [54]: (df[(df == 4) | (df == 5)].count() / df.index.size).to_frame().T
Out[54]:
A B C D E
0 0.333333 0.666667 0.333333 0.333333 0.333333
Upvotes: 2
Reputation: 91
df = pd.DataFrame(data = [[3, 3, 1, 2, 1], [5, 5, 4, 4, 3], [3, 4, 1, 1, 5]], columns=['A', 'B', 'C', 'D', 'E'])
nums_to_count = [4, 5]
count_of_nums = df.isin(nums_to_count).sum()
total_nums = df.count()
share_of_nums = count_of_nums / total_nums
Where "share_of_nums" is you desired output.
or to keep it neat:
df = pd.DataFrame(data = [[3, 3, 1, 2, 1], [5, 5, 4, 4, 3], [3, 4, 1, 1, 5]], columns=['A', 'B', 'C', 'D', 'E'])
nums_to_count = [4, 5]
share_of_nums = df.isin(nums_to_count).sum() / df.count()
Upvotes: 1