FallingInForward
FallingInForward

Reputation: 315

pandas filter series with lists of strings as values

So I'm trying to make a simple filter that will take in the dataframe and filter out all rows that don't have the target genre. It'll be easier to explain with the code:

    import pandas as pd

test = [{
        "genre":["RPG","Shooter"]},
        {"genre":["RPG"]},
        {"genre":["Shooter"]}]
        
data =pd.DataFrame(test)

fil = data.genre.isin(['RPG'])

I want the filter to return a dataframe with the following elements:

[{"genre":["RPG"]},
{"genre":["RPG", "Shooter"]}]

This is the error I'm getting when I try my code:

SystemError: <built-in method view of numpy.ndarray object at 0x00000180D1DF2760> returned a result with an error set

Upvotes: 0

Views: 80

Answers (2)

Dani Mesejo
Dani Mesejo

Reputation: 61910

The problem is that the elements of genre are lists, so isin does not work. Use:

mask = data['genre'].apply(frozenset(['RPG']).issubset)
print(data[mask])

Output

            genre
0  [RPG, Shooter]
1           [RPG]

The expression:

frozenset(['RPG']).issubset

Checks that any list is contained in each row, from the documentation:

Test whether every element in the set is in other.

So you could also check for multiple values easily, for example:

mask = data['genre'].apply(frozenset(['RPG', "Shooter"]).issubset)
print(data[mask])

Output

            genre
0  [RPG, Shooter]

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150735

You want:

data[data.genre.apply(lambda x: 'RPG' in x)]

Or:

data[data.genre.explode().eq('RPG').any(level=0)]

Output:

            genre
0  [RPG, Shooter]
1           [RPG]

Upvotes: 0

Related Questions