ryanf
ryanf

Reputation: 3

Pandas str.contains produces unexpected results

I am trying to search a column in a pandas dataframe (python 3.8.8) to find the rows that contain different strings. Here is an example of the df column I'm searching.

print(df['fileName'])
0         data/0001_X+0Y-1-0.txt
1         data/0001_X+0Y-1-0.txt
2         data/0001_X+0Y-1-0.txt
3         data/0001_X+0Y-1-0.txt
4         data/0001_X+0Y-1-0.txt
                            ...                   
171721    data/2293_X-1Y-1-0.txt
171722    data/2293_X-1Y-1-0.txt
171723    data/2293_X-1Y-1-0.txt
171724    data/2293_X-1Y-1-0.txt
171725    data/2293_X-1Y-1-0.txt

Does anyone know why I am only able to return results for 1 out of 9 different strings I want to search for? I am certain that there aren't typos in my search strings. I've copy/pasted into my script and interactive python shell to be sure.

Returns df with correct number of rows: contain_values = df[df['fileName'].str.contains("X-1Y-1-0")]

Returns empty df: contain_values2 = df[df['fileName'].str.contains("X+0Y-1-0")]

Upvotes: 0

Views: 73

Answers (1)

Corralien
Corralien

Reputation: 120399

You have to disable regex on str.contains because + means one or more characters:

>>> df[df['fileName'].str.contains("X+0Y-1-0", regex=False)]

                 fileName
0  data/0001_X+0Y-1-0.txt
1  data/0001_X+0Y-1-0.txt
2  data/0001_X+0Y-1-0.txt
3  data/0001_X+0Y-1-0.txt
4  data/0001_X+0Y-1-0.txt

Or suggested by @YusufErtas, escape the sign + with \+:

>>> df[df['fileName'].str.contains("X\\+0Y-1-0")]

                 fileName
0  data/0001_X+0Y-1-0.txt
1  data/0001_X+0Y-1-0.txt
2  data/0001_X+0Y-1-0.txt
3  data/0001_X+0Y-1-0.txt
4  data/0001_X+0Y-1-0.txt

Upvotes: 1

Related Questions