Zanam
Zanam

Reputation: 4807

Count number of positive columns from large dataframe

I have a large DataFrame with 500 columns out of which 300 columns col1, col2, ..col300 appears as follows:

idx col1    col2
a   -1      4
b    2      1
c   -1     -1

I want to get the following for the 300 columns. Other 200 columns are variables I am not interested in:

idx col1    col2  numPos
a   -1      4     1
b    2      1     2
c   -1     -1     0

where for each row I want to get the number of positive values. I don't want to use the apply method as there are about 2 million rows in the DataFrame. Is there a pythonic way to do this?

Upvotes: 3

Views: 83

Answers (3)

tozCSS
tozCSS

Reputation: 6114

df['numPos'] = (df[cols] > 0).sum(axis=1) where cols is a list of column names. If the 300 columns are consecutive, then in place of df[cols] you can use df.iloc[:,start_offset:start_offset+300] where start_offset is the index of the first column.

Upvotes: 0

wwnde
wwnde

Reputation: 26676

Best way out is to exclude object columns, query what is greater than 0 in the df and sum along the row axis

  df['numPos']= df.select_dtypes(exclude='object').gt(0).sum(1)



  idx  col1  col2  status
0   a    -1     4       2
1   b     2     1       3
2   c    -1    -1       0

Upvotes: 0

user7864386
user7864386

Reputation:

You could select the columns + gt (which creates a boolean DataFrame where it's True if a value is positive) + sum on axis:

df['numPos'] = df[['col1','col2']].gt(0).sum(axis=1)

Maybe you could filter them too, like:

df['numPos'] = df.filter(like='col').gt(0).sum(axis=1)

Output:

  idx  col1  col2  numPos
0   a    -1     4       1
1   b     2     1       2
2   c    -1    -1       0

Upvotes: 2

Related Questions