pandas filter series with lists of strings as values

Question

So I'm trying to make a simple filter that will take in the dataframe and filter out all rows that don't have the target genre. It'll be easier to explain with the code:

    import pandas as pd

test = [{
        "genre":["RPG","Shooter"]},
        {"genre":["RPG"]},
        {"genre":["Shooter"]}]
        
data =pd.DataFrame(test)

fil = data.genre.isin(['RPG'])

I want the filter to return a dataframe with the following elements:

[{"genre":["RPG"]},
{"genre":["RPG", "Shooter"]}]

This is the error I'm getting when I try my code:

SystemError:  returned a result with an error set

Dani Mesejo · Accepted Answer

The problem is that the elements of genre are lists, so isin does not work. Use:

mask = data['genre'].apply(frozenset(['RPG']).issubset)
print(data[mask])

Output

            genre
0  [RPG, Shooter]
1           [RPG]

The expression:

frozenset(['RPG']).issubset

Checks that any list is contained in each row, from the documentation:

Test whether every element in the set is in other.

So you could also check for multiple values easily, for example:

mask = data['genre'].apply(frozenset(['RPG', "Shooter"]).issubset)
print(data[mask])

Output

            genre
0  [RPG, Shooter]

pandas filter series with lists of strings as values

Answers (2)

Related Questions