user9463814
user9463814

Reputation: 199

How to match a list column with a column in another dataframe in pandas?

I have a dataframe that looks something like this:

main_df:

student name | program_ids
-----------------------------
Alex         | [1,2,7]
Tim          | [37]
May          | [17,1,11]
Gloria       | NaN
James        | [37,42]
Nina         | []

prog_df:

 prog_id    | program
 -------------------------
 1          | Arts
 2          | Music
 37         | Languages
 11         | Physics
 17         | Chemistry
 42         | Math
 7          | Dance

I would like to match my "main_df" with "prog_df" on 'program_id' column such that:

I get a dataframe like this:

student name | program
-----------------------
Alex         | Arts, Music, Dance
Tim          | Languages
May          | Chemistry, Arts, Physics
Gloria       | NaN
James        | Languages, Math
Nina         | NaN

Is there way to match the the list elements of a pandas column with column values in another dataframe?

Thanks

Upvotes: 0

Views: 1066

Answers (2)

cs95
cs95

Reputation: 402333

First, a little preprocessing:

df['program_ids'] = df['program_ids'].map(lambda x: [] if pd.isnull(x) else x)
df

  student name  program_ids
0         Alex    [1, 2, 7]
1          Tim         [37]
2          May  [17, 1, 11]
3       Gloria           []
4        James     [37, 42]
5         Nina           []

Next, create a mapping of program id to values:

mapping = dict(prog_df.values)

Use this to map IDs to programs with a list comprehension (for performance):

df['program_ids']  = [[mapping.get(x) for x in  l] for l in df['program_ids']]
df

  student name                 program_ids
0         Alex        [Arts, Music, Dance]
1          Tim                 [Languages]
2          May  [Chemistry, Arts, Physics]
3       Gloria                          []
4        James           [Languages, Math]
5         Nina                          []

And finally, as an optional step, to join the lists, use str.join:

df['program_ids'].str.join(',').replace('', np.nan)

0          Arts,Music,Dance
1                 Languages
2    Chemistry,Arts,Physics
3                       NaN
4            Languages,Math
5                       NaN
Name: program_ids, dtype: object

Upvotes: 3

BENY
BENY

Reputation: 323226

You can do with

df1.loc[df1.program_ids.isnull(),'program_ids']=[[]]
d=dict(zip(df2.prog_id,df2.program))
df1['New']=[','.join([d.get(y) for y in x] )for x in df1.program_ids]
df1
Out[15]: 
  studentname  program_ids                     New
0        Alex    [1, 2, 7]        Arts,Music,Dance
1         Tim         [37]               Languages
2         May  [17, 1, 11]  Chemistry,Arts,Physics
3      Gloria           []                        
4       James     [37, 42]          Languages,Math
5        Nina           []                        

Upvotes: 4

Related Questions