Reputation: 1103
I have two data frames. df1 looks like -
MovieName Actors
lights out Maria Bello
legend Tom Hardy*Emily Browning*Christopher Eccleston*David Thewlis
df2 looks like -
ActorName Gender
Tom male
Emily female
Christopher male
I want to add two columns in df1 'female_actors' and 'male_actors' which contains the count of female and male actors in that particular movie respectively. Whether an actor is male or female is done based on df2.
Here is what I am doing -
def func(actors, gender):
actors = [act.split()[0] for act in actors.split('*')]
n_gender = df2.Gender[df2.Gender==gender][df2.ActorName.isin(actors)].count()
return n_gender
df1['male_actors'] = df1.Actors.apply(lambda x: func(x, 'male'))
df1['female_actors'] = df1.Actors.apply(lambda x: func(x, 'female'))
This code gives me list index out of range error.
Please note that -
If particular name isn't present in gender.csv, don't count it in the total. If there is just one actor in a movie, and it isn't present in gender.csv, then it's count should be zero.
Result should be -
MovieName Actors male_actors female_actors
lights out Maria Bello 0 0
legend Tom Hardy*Emily Browning*Christopher Eccleston*David Thewlis 2 1
Feel free to suggest some other approach.
Upvotes: 1
Views: 2424
Reputation: 294546
using str
and join
d2 = df2.set_index('ActorName')
d1 = df1.set_index('MovieName')
method 1
split
d1.join(d1.Actors.str.split('*', expand=True).stack() \
.str.split(expand=True)[0].map(d2.Gender) \
.groupby(level='MovieName') \
.value_counts().unstack()).fillna(0).reset_index()
method 2
extractall
d1.join(d1.Actors.str.extractall('((?P<first>[^*]+)\s+(?P<last>[^*]+))') \
['first'].map(d2.Gender).groupby(level='MovieName') \
.value_counts().unstack()).fillna(0).reset_index()
Upvotes: 1
Reputation: 12943
How about this?
df1['Male'] = df1.Actors.apply(lambda x: len(pd.concat( [df2[(df2.ActorName == name) & (df2.Gender == 'male')] for name in x.split('*')] )))
df1['Female'] = df1.Actors.apply(lambda x: len(pd.concat( [df2[(df2.ActorName == name) & (df2.Gender == 'female')] for name in x.split('*')] )))
Upvotes: 2