capnahab
capnahab

Reputation: 363

Python,Pandas, - How to search for an integer in a column

This outwardly simple problem has been frustrating me for awhile.I am new to python and I can't get anywhere. I am looking to find a single integer in a specific one column of a large dataframe (600,000 records and 56 columns). Full match. I have recreated a simplified example here.

eg How to find the number 5440499367 (2nd row) in this dataframe (epi) and return the results.

    KEY
0   

1   5440499367
2   5040484761
3   5404390876
4   5444456006
5   5040507739

epi.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   KEY  6 non-null      int64
dtypes: int64(1)
memory usage: 176.0 bytes

Using

epi["KEY"] == 5040501624
0     True
1    False
2    False
3    False
4    False
5    False
Name: KEY, dtype: bool

so thats fine, I have created a boolean series which has found the number. When I try to do that in the large dataframe (I can't show that data because it has identifiers)

eg below, I get an error and an empty row 0x56 columns.

found = df[df["KEY"] == "5440499367"] 

/opt/anaconda3/lib/python3.7/site-packages/pandas/core/computation/expressions.py:68: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  return op(a, b)

I have tried also

5040501624 in epi["KEY"]
False

Which I feel is possibly the wrong use. I wondered if this was a datatype issue and have checked that the dtype is Int64 and in the 600,000 rows there are no empty values.

What is the simplest way to do this ?

Upvotes: 1

Views: 4127

Answers (1)

tania
tania

Reputation: 2315

It looks like you're accidentally comparing with a string version of the integer you're looking for. Instead of:

found = df[df["KEY"] == "5440499367"] 

you should do what you were originally trying with boolean masking:

found = df[df["KEY"] == 5440499367] 

As you said, the dtype of "KEY" column is Int64, so you should be comparing an integer to an integer. This FutureWarning is coming from numpy, and often occurs when you compare a str to some numeric values. A detailed explanation of the warning can be found here

Upvotes: 1

Related Questions