Reputation: 11321
Let's say I have a DataFrame that has lists as its values:
df = pd.DataFrame({'languages': [['en'], ['fr']], 'author': ['Dickens, Charles', 'Austen, Jane']})
I can query it for strings easily:
df[df['author'] == 'Dickens, Charles']
which correctly returns the subset of df
that matches that criteria. But when I have cell contents that are lists, such as languages
whose values are things like ['en']
, I can't seem to search for it:
df[df['languages'] == ['en']]
I get:
ValueError: Arrays were different lengths: 2 vs 1
How can I query for contents that are a list?
Upvotes: 3
Views: 962
Reputation: 19
I typically use an isin()
filter and pass a list as an argument.
lst = ['A', 'B']
df[df['column'].isin(lst)]
Upvotes: -1
Reputation: 294488
We can use some trickery to get this to run faster. Note that this avoids the use of apply
.
# create a numpy array of lists... one list to be exact
c = np.empty(1, object)
c[0] = ['en']
df[df.languages.values == c]
author languages
0 Dickens, Charles [en]
Upvotes: 1
Reputation: 215057
What you might do is use apply
method to loop through the languages
column and then compare the items:
df[df.languages.apply(lambda x: x == ['en'])]
# author languages
#0 Dickens, Charles [en]
Upvotes: 4