Reputation: 363
This outwardly simple problem has been frustrating me for awhile.I am new to python and I can't get anywhere. I am looking to find a single integer in a specific one column of a large dataframe (600,000 records and 56 columns). Full match. I have recreated a simplified example here.
eg How to find the number 5440499367 (2nd row) in this dataframe (epi) and return the results.
KEY
0
1 5440499367
2 5040484761
3 5404390876
4 5444456006
5 5040507739
epi.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 KEY 6 non-null int64
dtypes: int64(1)
memory usage: 176.0 bytes
Using
epi["KEY"] == 5040501624
0 True
1 False
2 False
3 False
4 False
5 False
Name: KEY, dtype: bool
so thats fine, I have created a boolean series which has found the number. When I try to do that in the large dataframe (I can't show that data because it has identifiers)
eg below, I get an error and an empty row 0x56 columns.
found = df[df["KEY"] == "5440499367"]
/opt/anaconda3/lib/python3.7/site-packages/pandas/core/computation/expressions.py:68: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
return op(a, b)
I have tried also
5040501624 in epi["KEY"]
False
Which I feel is possibly the wrong use. I wondered if this was a datatype issue and have checked that the dtype is Int64 and in the 600,000 rows there are no empty values.
What is the simplest way to do this ?
Upvotes: 1
Views: 4127
Reputation: 2315
It looks like you're accidentally comparing with a string version of the integer you're looking for. Instead of:
found = df[df["KEY"] == "5440499367"]
you should do what you were originally trying with boolean masking:
found = df[df["KEY"] == 5440499367]
As you said, the dtype
of "KEY" column is Int64
, so you should be comparing an integer to an integer. This FutureWarning
is coming from numpy
, and often occurs when you compare a str
to some numeric values. A detailed explanation of the warning can be found here
Upvotes: 1