Reputation: 77
I have a dataframe
A B C
0 True True True
1 True False False
2 False False False
I would like to add a row D with the following conditions:
D is true, if A, B and C are true. Else, D is false.
I tried
df['D'] = df.loc[(df['A'] == True) & df['B'] == True & df['C'] == True]
I get
TypeError: cannot compare a dtyped [float64] array with a scalar of type [bool]
Then I tried to follow this example and wrote a similar function as suggested in the link:
def all_true(row):
if row['A'] == True:
if row['B'] == True:
if row['C'] == True:
val = True
else:
val = 0
return val
df['D'] = df.apply(all_true(df), axis=1)
In which case I get
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I'd appreciate suggestions. Thanks!
Upvotes: 5
Views: 2544
Reputation: 862681
Comparing with True
is not necessary, ony chain boolean masks with &
:
df['D'] = df['A'] & df['B'] & df['C']
If performance is important:
df['D'] = df['A'].values & df['B'].values & df['C'].values
Or use DataFrame.all
for check all True
s per rows:
df['D'] = df[['A','B','C']].all(axis=1)
#numpy all
#df['D'] = np.all(df.values,1)
print (df)
A B C D
0 True True True True
1 True False False False
2 False False False False
Performance:
np.random.seed(125)
def all1(df):
df['D'] = df.all(axis=1)
return df
def all1_numpy(df):
df['D'] = np.all(df.values,1)
return df
def eval1(df):
df['D'] = df.eval('A & B & C')
return df
def chained(df):
df['D'] = df['A'] & df['B'] & df['C']
return df
def chained_numpy(df):
df['D'] = df['A'].values & df['B'].values & df['C'].values
return df
def make_df(n):
df = pd.DataFrame({'A':np.random.choice([True, False], size=n),
'B':np.random.choice([True, False], size=n),
'C':np.random.choice([True, False], size=n)})
return df
perfplot.show(
setup=make_df,
kernels=[all1, all1_numpy, eval1,chained,chained_numpy],
n_range=[2**k for k in range(2, 25)],
logx=True,
logy=True,
equality_check=False,
xlabel='len(df)')
Upvotes: 5
Reputation: 13255
Using pandas eval
:
df['D'] = df.eval('A & B & C')
Or:
df = df.eval('D = A & B & C')
#alternative inplace df.eval('D = A & B & C', inplace=True)
Or:
df['D'] = np.all(df.values,1)
print(df)
A B C D
0 True True True True
1 True False False False
2 False False False False
Upvotes: 2
Reputation: 71580
Or even better:
df['D']=df.all(1)
And now:
print(df)
Is:
A B C D
0 True True True True
1 True False False False
2 False False False False
Upvotes: 5