Ventoii
Ventoii

Reputation: 46

Dataframe lookup with lists, and multiple dataframes

I have two dataframes:

df1 = pd.DataFrame({'ID1': ['2456-AA', '2457-AA', '2458-AA', '2657-AA'],
                    'QT': [10, 5, 45, 10],
                    'ID2': [['5365-BB'], ['5365-BB', '6552-BB'], ['6552-BB', '7878-BB'], ['5365-BB', '2323-BB', '2556-BB']]})

df2 = pd.DataFrame({'ID2': ['5365-BB', '6552-BB', '5858-BB', '2323-BB'],
                    'TASK': ['Check pot', 'Check Equip1.', 'Check Equip2.', 'Check Equip3.']})

df1
       ID1  QT                          ID2
0  2456-AA  10                    [5365-BB]
1  2457-AA   5           [5365-BB, 6552-BB]
2  2458-AA  45           [6552-BB, 7878-BB]
3  2657-AA  10  [5365-BB, 2323-BB, 2556-BB]

df2
       ID2           TASK
0  5365-BB      Check pot
1  6552-BB  Check Equip1.
2  5858-BB  Check Equip2.
3  2323-BB  Check Equip3.

I would like to look up the values in ID2, df2 in ID2, df1 and return the ID1 values as a list to each ID2 in df2. The end result would look something like this:

df2
       ID2           TASK                          ID1
0  5365-BB      Check pot  [2456-AA, 2457-AA, 2657-AA]
1  6552-BB  Check Equip1.           [2457-AA, 2458-AA]
2  5858-BB  Check Equip2.                           []
3  2323-BB  Check Equip3.                    [2657-AA]

I've tried to complete this re-using a method I did before with lambda but I keep returning blank values. Thank you for any support,

EDIT

It is worth mentioning that the datasets i'm trying to work on have more columns than inside this example. Please let me know if you need more information.

Upvotes: 1

Views: 250

Answers (4)

jezrael
jezrael

Reputation: 862601

Use DataFrame.explode with aggregate list, then add not exist values from df2['ID2'] by Series.reindex and pass to Series.map or DataFrame.join:

s = (df1.explode('ID2')
        .groupby('ID2')['ID1']
        .agg(list)
        .reindex(df2['ID2'].unique(), fill_value=[]))
df2['ID1'] = df2['ID2'].map(s)
print (df2)
       ID2           TASK                          ID1
0  5365-BB      Check pot  [2456-AA, 2457-AA, 2657-AA]
1  6552-BB  Check Equip1.           [2457-AA, 2458-AA]
2  5858-BB  Check Equip2.                           []
3  2323-BB  Check Equip3.                    [2657-AA]

df2 = df2.join(s.rename('ID1'), on='ID2')
print (df2)

       ID2           TASK                          ID1
0  5365-BB      Check pot  [2456-AA, 2457-AA, 2657-AA]
1  6552-BB  Check Equip1.           [2457-AA, 2458-AA]
2  5858-BB  Check Equip2.                           []
3  2323-BB  Check Equip3.                    [2657-AA]

Upvotes: 1

Shubham Sharma
Shubham Sharma

Reputation: 71689

Let's explode and groupby the dataframe df1 on column ID2 and agg ID1 using list to create a mapping series m. Then map the ID2 in df2 with the series m, finally use boolean indexing to fill the nan values with []:

m = df1.explode('ID2').groupby('ID2')['ID1'].agg(list)
df2['ID1'] = df2['ID2'].map(m)
df2.loc[df2['ID1'].isna(), 'ID1'] = [[]] * df2['ID1'].isna().sum()

print(df2)

       ID2           TASK                          ID1
0  5365-BB      Check pot  [2456-AA, 2457-AA, 2657-AA]
1  6552-BB  Check Equip1.           [2457-AA, 2458-AA]
2  5858-BB  Check Equip2.                           []
3  2323-BB  Check Equip3.                    [2657-AA]

Upvotes: 1

tgrandje
tgrandje

Reputation: 2514

A lambda function is no good idea here. What you should be doing is recreating dataframes without lists and afterward merge datasets (and at the end, recreate lists if you really need it) ; something like this (I haven't tested it) :

array = df1.values.tolist()
array = [[id1, qt, id2] for [id1_list, qt, id2_list] in array for id1 in id1_list for id2 in id2_list]
df1_altered = pd.DataFrame(array, columns=['ID1', 'QT', 'ID2'])

array = df2.values.tolist()
array = [[task, id2] for [id2_list, task] in array for id2 in id2_list]
df2_altered = pd.DataFrame(array, columns=['Task', 'ID2'])

df = df2_altered.merge(df1_altered, on='ID2', how='left')
df = df.groupby(['ID2', 'Task'])['ID1'].apply(lambda x:[y for y in list(x) if not pd.isnull(y)]).reset_index(drop=False)
df['ID2'] = df['ID2'].apply(lambda x:[x])

Upvotes: 1

Tess
Tess

Reputation: 11

you'd need to have a "common ID" in both data sets. The ID2 in df2 is fine, but the ID2 in df1 would need some manipulation. You may need to separate the ID2 in df1 into 3 columns, For example: ID1, row 1 [2456-AA, 2457-AA, 2657-AA] into [2456-AA], [2457-AA], [2657-AA]

then perform the left join (similar to Vlookup in Excel)

df2 left join df1 by ID2 (can be done in several languages)

Upvotes: 0

Related Questions