Reputation: 575
I have a dataframe with 2 columns containing floats. I first excluded rows where columns contain zeros and then wanted to check for each row, if the elements of the columns are equal.
I tried:
df.loc[(df['col1'] != 0.0) & (df['col2'] != 0.0), 'Error'] = np.where(assert_almost_equal(df['col1'], df['col2']), 'Not equal', '')
result was:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
and also tried:
np.where(df['col1'] == df['col2'], 'Not equal', '')
and
np.where(df.col1.eq(df.col2), 'Not equal', '')
and result was:
ValueError: shape mismatch: value array of shape (24788,) could not be broadcast to indexing result of shape (9576,)
and also tried apply
-function.
How can I compare the floats within two columns row by row? I do really want need equality, not isclose
or something similar.
Thank you,
MaMo
Upvotes: 2
Views: 1890
Reputation: 862511
I think need chain all masks together for same size boolean mask and DataFrame
for avoid shape mismatch valueError
and not change original size of DataFrame
:
df = pd.DataFrame({'col1':[0,5,4,5.7,5,4],
'col2':[0,0,9,5.7,2,3],
'col3':[1,3,5,7,1,0]})
#print (df)
mask=(df['col1'] != 0.0) & (df['col2'] != 0.0) & (df['col1'] == df['col2'])
df['Error'] = np.where(mask, 'Equal', 'Not equal')
print (df)
col1 col2 col3 Error
0 0.0 0.0 1 Not equal
1 5.0 0.0 3 Not equal
2 4.0 9.0 5 Not equal
3 5.7 5.7 7 Equal
4 5.0 2.0 1 Not equal
5 4.0 3.0 0 Not equal
Upvotes: 3
Reputation: 641
How can I compare the floats within two columns row by row?
I would suggest to use pandas apply
like so:
def compare_floats(row):
return row['col1'] == row['col2'] # you can use any comparison you want here
df['col3'] = df.apply(compare_floats, axis=1)
Upvotes: 0
Reputation: 323226
Can you try this ? filter at the beginning
df=df.loc[(df['col1'] != 0.0) & (df['col2'] != 0.0),:]
df['Error'] = np.where(assert_almost_equal(df['col1'], df['col2']), 'Not equal', '')
The reason
ValueError: shape mismatch: value array of shape (24788,) could not be broadcast to indexing result of shape (9576,)
You filter it when doing the np.where, so your df became a subset of original df, but in you np.where, the df still the original df, that is why have size different
24788 : origina size, 9576 : size after excluded rows where columns contain zeros
Upvotes: 2