Reputation: 1093
I have a dataframe df with two columns called 'MovieName' and 'Actors'. It looks like:
MovieName Actors
lights out Maria Bello
legend Tom Hardy*Emily Browning*Christopher Eccleston*David Thewlis
Please note that different actor names are separated by '*'. I have another csv file called gender.csv which has the gender of all actors based on their first names. gender.csv looks like -
ActorName Gender
Tom male
Emily female
Christopher male
I want to add two columns in my dataframe 'female_actors' and 'male_actors' which contains the count of female and male actors in that particular movie respectively.
How do I achieve this task using both df and gender.csv in pandas?
Please note that -
Result of above example should be -
MovieName Actors male_actors female_actors
lights out Maria Bello 0 0
legend Tom Hardy*Emily Browning*Christopher Eccleston*David Thewlis 2 1
Upvotes: 1
Views: 88
Reputation: 2623
import pandas as pd
df1 = pd.DataFrame({'MovieName': ['lights out', 'legend'], 'Actors':['Maria Bello', 'Tom Hardy*Emily Browning*Christopher Eccleston*David Thewlis']})
df2 = pd.DataFrame({'ActorName': ['Tom', 'Emily', 'Christopher'], 'Gender':['male', 'female', 'male']})
def func(actors, gender):
actors = [act.split()[0] for act in actors.split('*')]
n_gender = df2.Gender[df2.Gender==gender][df2.ActorName.isin(actors)].count()
return n_gender
df1['male_actors'] = df1.Actors.apply(lambda x: func(x, 'male'))
df1['female_actors'] = df1.Actors.apply(lambda x: func(x, 'female'))
df1.to_csv('res.csv', index=False)
print df1
Output
Actors,MovieName,male_actors,female_actors
Maria Bello,lights out,0,0
Tom Hardy*Emily Browning*Christopher Eccleston*David Thewlis,legend,2,1
Upvotes: 3