Reputation: 35
Let's say I have a dataframe that looks like:
col1
0 ['str1', 'str2']
1 ['str3', 'str4']
2 []
3 ['str2', 'str4']
4 ['str1', 'str3']
5 []
I'm trying to craft a df.query() string that would be the equivalent of saying "'str3' in col1". So it would return:
col1
1 ['str3', 'str4']
4 ['str1', 'str3']
I've tried df.query("col1.str.contains('str3')") but that results in
"None of [Float64Index([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n ...\n nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],\n dtype='float64', length=652)] are in the [index]"
I'm guessing because many of the lists in this column may be empty and convert to nan floats instead of strings?
It's highly preferable that I use query strings for this since rather than list constructors, since I want this to be a script where other users can filter a dataframe using these query strings that they may craft.
Upvotes: 1
Views: 669
Reputation: 1
Is it possible in your set up to use lambda functions? Because it seems that data could be filtered with mask:
df['col'].apply(labmda x: 'str3' in x)
Upvotes: 0
Reputation: 262124
If you have a Series of lists, you can't use query
, instead go for boolean indexing with a list comprehension.
df[["str3" in x for x in df['col1']]]
If you need to chain the command, use loc
with a lambda:
df.loc[lambda d: ["str3" in x for x in d['col1']]]
Output:
col1
1 [str3, str4]
4 [str1, str3]
Upvotes: 1