Peregrin5
Peregrin5

Reputation: 35

How can I use pandas.query() to check if a string exists in a list within the dataframe?

Let's say I have a dataframe that looks like:

    col1
0   ['str1', 'str2']
1   ['str3', 'str4']
2   []
3   ['str2', 'str4']
4   ['str1', 'str3']
5   []

I'm trying to craft a df.query() string that would be the equivalent of saying "'str3' in col1". So it would return:

    col1
1   ['str3', 'str4']
4   ['str1', 'str3']

I've tried df.query("col1.str.contains('str3')") but that results in

"None of [Float64Index([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n ...\n nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],\n dtype='float64', length=652)] are in the [index]"

I'm guessing because many of the lists in this column may be empty and convert to nan floats instead of strings?

It's highly preferable that I use query strings for this since rather than list constructors, since I want this to be a script where other users can filter a dataframe using these query strings that they may craft.

Upvotes: 1

Views: 669

Answers (2)

Ilya
Ilya

Reputation: 1

Is it possible in your set up to use lambda functions? Because it seems that data could be filtered with mask:

df['col'].apply(labmda x: 'str3' in x)

Upvotes: 0

mozway
mozway

Reputation: 262124

If you have a Series of lists, you can't use query, instead go for boolean indexing with a list comprehension.

df[["str3" in x for x in df['col1']]]

If you need to chain the command, use loc with a lambda:

df.loc[lambda d: ["str3" in x for x in d['col1']]]

Output:

           col1
1  [str3, str4]
4  [str1, str3]

Upvotes: 1

Related Questions