postcolonialist
postcolonialist

Reputation: 661

Count rows across columns in a dataframe if they are greater than another value

I have a data frame:

df = pd.DataFrame({ 
    'BU': ['Total', 'Total', 'Total', 'CRS', 'CRS', 'CRS'], 
    'Line_Item': ['Revenues','EBT', 'Expenses', 'Revenues', 'EBT', 'Expenses'],
    '1Q16': [100, 120, 0, 200, 190, 210],
    '2Q16': [100, 0, 130, 200, 190, 210],
    '3Q16': [200, 250, 0, 120, 0, 190]})

I want an output that counts the number of rows in 1Q16, 2Q16, 3Q16 columns that are greater than 0. So the output could be a list with entries [5, 5, 4]

I tried .loc for slicing, but since zeros appear on different rows on different columns I can't do that. I want to loop through particular columns and count entries that are greater than 0

Upvotes: 1

Views: 127

Answers (2)

RavinderSingh13
RavinderSingh13

Reputation: 133508

With your shown samples, please try following. Using boolean masking here.

cols = ['1Q16','2Q16','3Q16']
df[cols].gt(0).sum()

Output will be as follows:

1Q16    5
2Q16    5
3Q16    4
dtype: int64

Simple explanation would be:

  • Creating list named cols which has column values in it(where we want to work upon).
  • Using gt(0) to get all elements which are greater than 0 in df's columns(given in cols list).
  • Then using sum function to sum all values more than 0 in given columns.

Upvotes: 2

Andrej Kesely
Andrej Kesely

Reputation: 195438

lst = df.filter(regex=r"\dQ\d+").gt(0).sum().tolist()
print(lst)

Prints:

[5, 5, 4]

Or:

lst = (df.loc[:, "1Q16":] > 0).sum().tolist()

Upvotes: 1

Related Questions