Reputation: 1549
I have the following df and I want to create an if statement to print only specific rows.
pandas_dataframe = pd.DataFrame({'actors' : [['Tom Hanks', 'Tim Allen', 'Don Rickles', 'Jim Varney'],
['Robin Williams', 'Jonathan Hyde', 'Kirsten Dunst'],
['Walter Matthau', 'Jack Lemmon', 'Sophia Loren', 'n', 'ix']],
'movie':['Toy Story', 'Jumanji', 'X-men']})
I want to print only the row where the list of actors has an actor with length 1, thus I want to print only the third row because it has at least 1 actor with length 1.
+--------------------------------------------------------------+------+
| actors |movie |
+--------------------------------------------------------------+------
|['Walter Matthau', 'Jack Lemmon', 'Sophia Loren', 'n', 'ix'] | X-men|
+--------------------------------------------------------------+------+
Upvotes: 1
Views: 49
Reputation: 13397
Try:
import numpy as np
mask=pandas_dataframe.actors.explode().str.len().eq(1)
res=pandas_dataframe.loc[np.unique(mask.loc[mask].index)]
Outputs:
actors movie
2 [Walter Matthau, Jack Lemmon, Sophia Loren, n,... X-men
Upvotes: 2
Reputation: 26
You can use below code, where filter the Dataframe in a single line.
df[df.actors.apply(lambda o: any([len(x)==1 for x in o]))]
Upvotes: 0
Reputation: 3961
Use a lambda function and apply:
import pandas as pd
df = pd.DataFrame(
{
"actors": [
["Tom Hanks", "Tim Allen", "Don Rickles", "Jim Varney"],
["Robin Williams", "Jonathan Hyde", "Kirsten Dunst"],
["Walter Matthau", "Jack Lemmon", "Sophia Loren", "n", "ix"],
],
"movie": ["Toy Story", "Jumanji", "X-men"],
}
)
filt = df.actors.apply(lambda x: any(len(y) == 1 for y in x))
df = df[filt]
print(df)
Returning:
actors movie
2 [Walter Matthau, Jack Lemmon, Sophia Loren, n, ix] X-men
Upvotes: 0