user12225102
user12225102

Reputation:

I want to count the Specific number in each column in Pandas DataFrame?

I want to count the number of times either of the specific values 4 & 5 appear in each column of a pandas DataFrame, proportionately.

Given this dataframe as input:

| A | B | C | D | E |
|---|---|---|---|---|
| 3 | 3 | 1 | 2 | 1 |
| 5 | 5 | 4 | 4 | 3 |
| 3 | 4 | 1 | 1 | 5 |

...first get raw counts per column, something like this...:

| A | B | C | D | E |
|---|---|---|---|---|
| 1 | 2 | 1 | 1 | 1 |

...then I want to divide the counts for each column, to get proportionate counts:

Desired Output:

| A    | B    | C    | D    | E    |
|------|------|------|------|------|
| 0.33 | 0.66 | 0.33 | 0.66 | 0.33 |

I tried the following df.counts() but this only returns me the total no. of counts

Upvotes: 2

Views: 1628

Answers (4)

jezrael
jezrael

Reputation: 862611

Use DataFrame.isin for get mask, for count values use sum and for ratio use mean, last for one row DataFrame add Series.to_frame with transpose:

mask = df.isin([4,5])
df1 = mask.sum().to_frame().T
print (df1)
   A  B  C  D  E
0  1  2  1  1  1

df2 = mask.mean().to_frame().T
#or division
#df2 = df1.div(len(df))

print (df2)
          A         B         C         D         E
0  0.333333  0.666667  0.333333  0.333333  0.333333

EDIT: Solution with groupby by Head Manager is by create index by Head Manager, then test membership and count sum or mean:

print (df)
  Head Manager  A  B  C  D  E
0         man1  3  3  1  2  1
1         man1  5  5  4  4  3
2         man1  3  4  1  1  5

df11 = df.set_index('Head Manager').isin([4,5]).sum(level=0)
#alternative
#df11 = df.set_index('Head Manager').isin([4,5]).groupby(level=0).sum()
print (df11)
                A    B    C    D    E
Head Manager                         
man1          1.0  2.0  1.0  1.0  1.0

df22 = df.set_index('Head Manager').isin([4,5]).mean(level=0)
#alternative
#df22 = df.set_index('Head Manager').isin([4,5]).groupby(level=0).mean()
print (df22)
                     A         B         C         D         E
Head Manager                                                  
man1          0.333333  0.666667  0.333333  0.333333  0.333333

Alternative with groupby.apply, but is is slow if many groups:

cols = df.columns.difference(['Head Manager'])
df22 = df.groupby('Head Manager')[cols].apply(lambda x: x.isin([3,4]).mean())
print (df22)
                     A         B         C         D         E
Head Manager                                                  
man1          0.666667  0.666667  0.333333  0.333333  0.333333

If want specify some columns:

df22 = df.groupby('Head Manager')['A','B','C','D','E'].apply(lambda x: x.isin([3,4]).mean())

Upvotes: 3

Jonas Byström
Jonas Byström

Reputation: 26159

This will get you the occurrence of each number:

>>> pd.DataFrame([df[col].value_counts()/len(df) for col in df.columns])[[4,5]].T
      A         B   C         D
4.0 NaN  0.333333 NaN       NaN
5.0 NaN       NaN NaN  0.333333

Upvotes: 0

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

Straightforwardly:

In [54]: (df[(df == 4) | (df == 5)].count() / df.index.size).to_frame().T                                                    
Out[54]: 
          A         B         C         D         E
0  0.333333  0.666667  0.333333  0.333333  0.333333

Upvotes: 2

Elias Gudmundsen
Elias Gudmundsen

Reputation: 91

df = pd.DataFrame(data = [[3, 3, 1, 2, 1], [5, 5, 4, 4, 3], [3, 4, 1, 1, 5]], columns=['A', 'B', 'C', 'D', 'E'])
nums_to_count = [4, 5]
count_of_nums = df.isin(nums_to_count).sum()
total_nums = df.count()
share_of_nums = count_of_nums / total_nums

Where "share_of_nums" is you desired output.

or to keep it neat:

df = pd.DataFrame(data = [[3, 3, 1, 2, 1], [5, 5, 4, 4, 3], [3, 4, 1, 1, 5]], columns=['A', 'B', 'C', 'D', 'E'])
    nums_to_count = [4, 5]
    share_of_nums = df.isin(nums_to_count).sum() / df.count()

Upvotes: 1

Related Questions