peevee
peevee

Reputation: 25

Pandas: obtaining frequency of a specified value in a row across multiple columns

I have a large dataset with many columns of numeric data and want to be able to count all the zeros in each of the rows. The following will generate a small sample of the data.

    df = pd.DataFrame(np.random.randint(0, 3, size=(8,3)),columns=list('abc'))
    df

Sample snippet of data

While I can create a column to sum all the values in the rows with the following code:

    df2=df.sum(axis=1)
    df2

And I can get a count of the zeros in a column:

    df.loc[df.a==1].count() 

I haven't been able to figure out how to get a count of the zeros across each of the rows. Any assistance would be greatly appreciated.

Upvotes: 1

Views: 72

Answers (1)

jezrael
jezrael

Reputation: 862851

For count matched values is possible use sum of Trues of boolean mask.

If need new column:

df['sum of 1'] = df.eq(1).sum(axis=1)
#alternative
#df['sum of 1'] = (df == 1).sum(axis=1)

Sample:

np.random.seed(2020)
df = pd.DataFrame(np.random.randint(0, 3, size=(8,3)),columns=list('abc'))

df['sum of 1'] = df.eq(1).sum(axis=1)
print (df)
   a  b  c  sum of 1
0  0  0  2         0
1  1  0  1         2
2  0  0  0         0
3  2  1  2         1
4  2  2  1         1
5  0  0  0         0
6  0  2  0         0
7  1  1  1         3

If need new row:

df.loc['sum of 1'] = df.eq(1).sum()
#alternative
#df.loc['sum of 1'] = (df == 1).sum()

Sample:

np.random.seed(2020)
df = pd.DataFrame(np.random.randint(0, 3, size=(8,3)),columns=list('abc'))

df.loc['sum of 1'] = df.eq(1).sum()
print (df)
          a  b  c
0         0  0  2
1         1  0  1
2         0  0  0
3         2  1  2
4         2  2  1
5         0  0  0
6         0  2  0
7         1  1  1
sum of 1  2  2  3

Upvotes: 1

Related Questions