Han Zhengzu
Han Zhengzu

Reputation: 3842

Tricky error when selecting the rows by certain value

I uploaded my files here and want to select certain samples(rows) by the value of certain features.

Here is my attempt:

df[df['Sample_ID'] == '160606-6']['OC_unc'].values
>array([ 0.9874218,  1.089288 ])
 # just locate to the sample with ["OC_unc] == 0.9874218
df.loc[df['OC_unc'] == 0.9874218]
> No result

I don't know why this method failed with float data. I tried to select rows with string list, it always works well

Upvotes: 1

Views: 33

Answers (1)

brianpck
brianpck

Reputation: 8254

I suspect this has to do with floating point truncation in pandas. Consider the following example:

>>> df = pd.DataFrame([1.00000000001], columns=['test'])
>>> df
   test
0   1.0
>>> df.loc[df['test'] == 1.0]
Empty DataFrame
Columns: [test]
Index: []

One way to fix this is to increase the precision of the display using pd.set_option:

>>> pd.set_option('precision', 15)
>>> df
            test
0  1.00000000001
>>> df.loc[df['test'] == 1.00000000001]
            test
0  1.00000000001

Upvotes: 1

Related Questions