MaMo
MaMo

Reputation: 575

Check equality of float columns in python

I have a dataframe with 2 columns containing floats. I first excluded rows where columns contain zeros and then wanted to check for each row, if the elements of the columns are equal.

I tried:

df.loc[(df['col1'] != 0.0) & (df['col2'] != 0.0), 'Error'] = np.where(assert_almost_equal(df['col1'], df['col2']), 'Not equal', '')

result was:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

and also tried:

np.where(df['col1'] == df['col2'], 'Not equal', '')

and

np.where(df.col1.eq(df.col2), 'Not equal', '')

and result was:

ValueError: shape mismatch: value array of shape (24788,) could not be broadcast to indexing result of shape (9576,)

and also tried apply-function.

How can I compare the floats within two columns row by row? I do really want need equality, not isclose or something similar.

Thank you,

MaMo

Upvotes: 2

Views: 1890

Answers (3)

jezrael
jezrael

Reputation: 862511

I think need chain all masks together for same size boolean mask and DataFrame for avoid shape mismatch valueError and not change original size of DataFrame:

df = pd.DataFrame({'col1':[0,5,4,5.7,5,4],
                   'col2':[0,0,9,5.7,2,3],
                   'col3':[1,3,5,7,1,0]})

#print (df)

mask=(df['col1'] != 0.0) & (df['col2'] != 0.0) & (df['col1'] == df['col2'])
df['Error'] = np.where(mask, 'Equal', 'Not equal')
print (df)
   col1  col2  col3      Error
0   0.0   0.0     1  Not equal
1   5.0   0.0     3  Not equal
2   4.0   9.0     5  Not equal
3   5.7   5.7     7      Equal
4   5.0   2.0     1  Not equal
5   4.0   3.0     0  Not equal

Upvotes: 3

Mohammed Elmahgiubi
Mohammed Elmahgiubi

Reputation: 641

How can I compare the floats within two columns row by row?

I would suggest to use pandas apply like so:

def compare_floats(row):
    return row['col1'] == row['col2'] # you can use any comparison you want here

df['col3'] = df.apply(compare_floats, axis=1)

Upvotes: 0

BENY
BENY

Reputation: 323226

Can you try this ? filter at the beginning

df=df.loc[(df['col1'] != 0.0) & (df['col2'] != 0.0),:]
df['Error'] = np.where(assert_almost_equal(df['col1'], df['col2']), 'Not equal', '')

The reason

ValueError: shape mismatch: value array of shape (24788,) could not be broadcast to indexing result of shape (9576,)

You filter it when doing the np.where, so your df became a subset of original df, but in you np.where, the df still the original df, that is why have size different

24788 : origina size, 9576 : size after excluded rows where columns contain zeros

Upvotes: 2

Related Questions