Jonathan
Jonathan

Reputation: 11321

In pandas, how to query for a list?

Let's say I have a DataFrame that has lists as its values:

df = pd.DataFrame({'languages': [['en'], ['fr']], 'author': ['Dickens, Charles', 'Austen, Jane']})

I can query it for strings easily:

df[df['author'] == 'Dickens, Charles']

which correctly returns the subset of df that matches that criteria. But when I have cell contents that are lists, such as languages whose values are things like ['en'], I can't seem to search for it:

df[df['languages'] == ['en']]

I get:

ValueError: Arrays were different lengths: 2 vs 1

How can I query for contents that are a list?

Upvotes: 3

Views: 962

Answers (3)

user8143996
user8143996

Reputation: 19

I typically use an isin() filter and pass a list as an argument.

lst = ['A', 'B']
df[df['column'].isin(lst)]

Upvotes: -1

piRSquared
piRSquared

Reputation: 294488

We can use some trickery to get this to run faster. Note that this avoids the use of apply.

# create a numpy array of lists... one list to be exact
c = np.empty(1, object)
c[0] = ['en']

df[df.languages.values == c]

             author languages
0  Dickens, Charles      [en]

Upvotes: 1

akuiper
akuiper

Reputation: 215057

What you might do is use apply method to loop through the languages column and then compare the items:

df[df.languages.apply(lambda x: x == ['en'])]
#             author    languages
#0  Dickens, Charles    [en]

Upvotes: 4

Related Questions