Reputation: 3081
I have the following code:
import random
import pandas as pd
import numpy as np
def countCol(row):
count = 0
for c in range(1, 7):
if (row['D' + str(c)] < 0):
count = count + 1
return(count)
data = {'ID': random.sample(range(1, 50), 5),
'D1': random.sample(range(-5, 5), 5),
'D2': random.sample(range(-5, 5), 5),
'D3': random.sample(range(-5, 5), 5),
'D4': random.sample(range(-5, 5), 5),
'D5': random.sample(range(-5, 5), 5),
'D6': random.sample(range(-5, 5), 5)
}
df = pd.DataFrame(data, columns = ['ID', 'D1', 'D2', 'D3', 'D4', 'D5', 'D6'])
df['Count'] = df.apply(countCol, axis=1)
display(df)
From the code it is obvious, I would like to count the number of columns that have a value smaller than zero. However, the code uses a for loop and the apply function, which makes me think this is a very inefficient approach to solve this problem.
Is there be a more vectorised or faster approach?
Upvotes: 0
Views: 276
Reputation: 26676
If you wanted to know specific columns which meet condition as well as records count, you can
df2=df.apply(lambda s: s.s<0)#Boolean select less than 0
c=df2.columns.to_numpy()#convert columns to numpy array
df2['negcount']=[c[i] for i in df2.to_numpy()]#Using list comprehension put columns which meet set conditions in a list
df2['count']=df2['negcount'].str.len()#Count elements in each list in each row
df=df.join(df2.iloc[:,-2:])#join a slice of df2 back to df1
print(df)
ID D1 D2 D3 D4 D5 D6 negcount count
0 24 3 1 -5 0 4 3 [D3] 1
1 10 1 -3 4 -1 -4 4 [D2, D4, D5] 3
2 30 -1 -1 3 4 2 1 [D1, D2] 2
3 48 -3 0 -3 -3 -2 -1 [D1, D3, D4, D5, D6] 5
4 9 4 -5 -1 -2 -5 -3 [D2, D3, D4, D5, D6] 5
Upvotes: 0
Reputation: 42886
Using DataFrame.filter
and DataFrame.lt
to get the correct columns and get booleans back where our conditions fit, then sum
to count the amount of columns:
df.filter(regex='D\d').lt(0).sum(axis=1)
0 4
1 5
2 1
3 2
4 4
dtype: int64
If your ID
column is always the first one, we can use DataFrame.iloc
instead:
df.iloc[:, 1:].lt(0).sum(axis=1)
0 4
1 5
2 1
3 2
4 4
dtype: int64
Upvotes: 2
Reputation: 7693
You can use apply
with lambda
like
df["Count"] = df.apply(lambda x: sum(x[1:] < 0) , 1)
df
ID D1 D2 D3 D4 D5 D6 Count
0 25 0 -3 4 -1 1 4 2
1 45 2 2 -1 -5 3 3 2
2 35 4 -4 -2 0 -4 -3 4
3 41 -3 4 3 4 -1 -1 3
4 34 3 -2 1 -4 2 1 2
Upvotes: 0