List Index Out of Range error - pandas

Question

I have two data frames. df1 looks like -

MovieName    Actors
lights out   Maria Bello
legend       Tom Hardy*Emily Browning*Christopher Eccleston*David Thewlis

df2 looks like -

ActorName    Gender
Tom          male
Emily        female
Christopher  male

I want to add two columns in df1 'female_actors' and 'male_actors' which contains the count of female and male actors in that particular movie respectively. Whether an actor is male or female is done based on df2.

Here is what I am doing -

def func(actors, gender):
    actors = [act.split()[0] for act in actors.split('*')]      
    n_gender = df2.Gender[df2.Gender==gender][df2.ActorName.isin(actors)].count()
    return n_gender

df1['male_actors'] = df1.Actors.apply(lambda x: func(x, 'male'))
df1['female_actors'] = df1.Actors.apply(lambda x: func(x, 'female'))

This code gives me list index out of range error.

Please note that -

If particular name isn't present in gender.csv, don't count it in the total. If there is just one actor in a movie, and it isn't present in gender.csv, then it's count should be zero.

Result should be -

MovieName    Actors      male_actors    female_actors
lights out   Maria Bello    0              0
legend       Tom Hardy*Emily Browning*Christopher Eccleston*David Thewlis    2    1

Feel free to suggest some other approach.

Alex · Accepted Answer

How about this?

df1['Male'] = df1.Actors.apply(lambda x: len(pd.concat( [df2[(df2.ActorName == name) & (df2.Gender == 'male')] for name in x.split('*')] )))
df1['Female'] = df1.Actors.apply(lambda x: len(pd.concat( [df2[(df2.ActorName == name) & (df2.Gender == 'female')] for name in x.split('*')] )))

List Index Out of Range error - pandas

Answers (2)

Related Questions