Reputation: 199
I have a dataframe that looks something like this:
main_df:
student name | program_ids
-----------------------------
Alex | [1,2,7]
Tim | [37]
May | [17,1,11]
Gloria | NaN
James | [37,42]
Nina | []
prog_df:
prog_id | program
-------------------------
1 | Arts
2 | Music
37 | Languages
11 | Physics
17 | Chemistry
42 | Math
7 | Dance
I would like to match my "main_df" with "prog_df" on 'program_id' column such that:
I get a dataframe like this:
student name | program
-----------------------
Alex | Arts, Music, Dance
Tim | Languages
May | Chemistry, Arts, Physics
Gloria | NaN
James | Languages, Math
Nina | NaN
Is there way to match the the list elements of a pandas column with column values in another dataframe?
Thanks
Upvotes: 0
Views: 1066
Reputation: 402333
First, a little preprocessing:
df['program_ids'] = df['program_ids'].map(lambda x: [] if pd.isnull(x) else x)
df
student name program_ids
0 Alex [1, 2, 7]
1 Tim [37]
2 May [17, 1, 11]
3 Gloria []
4 James [37, 42]
5 Nina []
Next, create a mapping of program id to values:
mapping = dict(prog_df.values)
Use this to map IDs to programs with a list comprehension (for performance):
df['program_ids'] = [[mapping.get(x) for x in l] for l in df['program_ids']]
df
student name program_ids
0 Alex [Arts, Music, Dance]
1 Tim [Languages]
2 May [Chemistry, Arts, Physics]
3 Gloria []
4 James [Languages, Math]
5 Nina []
And finally, as an optional step, to join the lists, use str.join
:
df['program_ids'].str.join(',').replace('', np.nan)
0 Arts,Music,Dance
1 Languages
2 Chemistry,Arts,Physics
3 NaN
4 Languages,Math
5 NaN
Name: program_ids, dtype: object
Upvotes: 3
Reputation: 323226
You can do with
df1.loc[df1.program_ids.isnull(),'program_ids']=[[]]
d=dict(zip(df2.prog_id,df2.program))
df1['New']=[','.join([d.get(y) for y in x] )for x in df1.program_ids]
df1
Out[15]:
studentname program_ids New
0 Alex [1, 2, 7] Arts,Music,Dance
1 Tim [37] Languages
2 May [17, 1, 11] Chemistry,Arts,Physics
3 Gloria []
4 James [37, 42] Languages,Math
5 Nina []
Upvotes: 4