Azee.
Azee.

Reputation: 723

Creating a list column in a dataframe based on values in another dataframe

I have two DataFrames:

df1:

       node        ids
0   ab          [978]
1   bc          [978, 121]

df2:

       name        id
0   alpha          978
1   bravo          121

I would like to add a new column called names in df1 where I get the list of names corresponding to ids column like this

   node            ids             names
0   ab            [978]            [alpha]
1   bc            [978, 121]       [alpha,bravo]

Would apprreciate help.

Upvotes: 1

Views: 526

Answers (2)

Celius Stingher
Celius Stingher

Reputation: 18367

How about you try with this alternative solution?

df1 = (df1.reset_index()).merge(
        ((df1['ids'].explode().reset_index()).merge(
                df2,how='left',left_on='ids',right_on='id').groupby('index')['name','ids'].agg(
                        lambda x: list(x)).reset_index()),
                how='left',on='index').drop(
                        columns=['index','ids_y']).rename(
                                columns={'ids_x':'ids'})
print(df1)

Output:

  node         ids            name
0   ab       [978]         [alpha]
1   bc  [978, 121]  [alpha, bravo]

Upvotes: 0

jezrael
jezrael

Reputation: 862511

Use if both id values are integers (or both strings, same types):

d = df2.set_index('id')['name'].to_dict()
df1['names'] = [[d.get(y) for y in x] for x in df1['ids']]
print (df1)
  node         ids           names
0   ab       [978]         [alpha]
1   bc  [978, 121]  [alpha, bravo]

If possible value in list not match value of df2['id'] is replaced some no match value:

d = df2.set_index('id')['name'].to_dict()
df1['names'] = [[d.get(y, 'no match') for y in x] for x in df1['ids']]
print (df1)
  node         ids              names
0   ab   [978, 10]  [alpha, no match]
1   bc  [978, 121]     [alpha, bravo]

Or is possible omit this values:

d = df2.set_index('id')['name'].to_dict()
df1['names'] = [[d[y] for y in x if y in d.keys()] for x in df1['ids']]
print (df1)
  node         ids           names
0   ab   [978, 10]         [alpha]
1   bc  [978, 121]  [alpha, bravo]

Upvotes: 4

Related Questions