Reputation: 46
I have two dataframes:
df1 = pd.DataFrame({'ID1': ['2456-AA', '2457-AA', '2458-AA', '2657-AA'],
'QT': [10, 5, 45, 10],
'ID2': [['5365-BB'], ['5365-BB', '6552-BB'], ['6552-BB', '7878-BB'], ['5365-BB', '2323-BB', '2556-BB']]})
df2 = pd.DataFrame({'ID2': ['5365-BB', '6552-BB', '5858-BB', '2323-BB'],
'TASK': ['Check pot', 'Check Equip1.', 'Check Equip2.', 'Check Equip3.']})
df1
ID1 QT ID2
0 2456-AA 10 [5365-BB]
1 2457-AA 5 [5365-BB, 6552-BB]
2 2458-AA 45 [6552-BB, 7878-BB]
3 2657-AA 10 [5365-BB, 2323-BB, 2556-BB]
df2
ID2 TASK
0 5365-BB Check pot
1 6552-BB Check Equip1.
2 5858-BB Check Equip2.
3 2323-BB Check Equip3.
I would like to look up the values in ID2, df2 in ID2, df1 and return the ID1 values as a list to each ID2 in df2. The end result would look something like this:
df2
ID2 TASK ID1
0 5365-BB Check pot [2456-AA, 2457-AA, 2657-AA]
1 6552-BB Check Equip1. [2457-AA, 2458-AA]
2 5858-BB Check Equip2. []
3 2323-BB Check Equip3. [2657-AA]
I've tried to complete this re-using a method I did before with lambda but I keep returning blank values. Thank you for any support,
EDIT
It is worth mentioning that the datasets i'm trying to work on have more columns than inside this example. Please let me know if you need more information.
Upvotes: 1
Views: 250
Reputation: 862601
Use DataFrame.explode
with aggregate list
, then add not exist values from df2['ID2']
by Series.reindex
and pass to Series.map
or DataFrame.join
:
s = (df1.explode('ID2')
.groupby('ID2')['ID1']
.agg(list)
.reindex(df2['ID2'].unique(), fill_value=[]))
df2['ID1'] = df2['ID2'].map(s)
print (df2)
ID2 TASK ID1
0 5365-BB Check pot [2456-AA, 2457-AA, 2657-AA]
1 6552-BB Check Equip1. [2457-AA, 2458-AA]
2 5858-BB Check Equip2. []
3 2323-BB Check Equip3. [2657-AA]
df2 = df2.join(s.rename('ID1'), on='ID2')
print (df2)
ID2 TASK ID1
0 5365-BB Check pot [2456-AA, 2457-AA, 2657-AA]
1 6552-BB Check Equip1. [2457-AA, 2458-AA]
2 5858-BB Check Equip2. []
3 2323-BB Check Equip3. [2657-AA]
Upvotes: 1
Reputation: 71689
Let's explode
and groupby
the dataframe df1
on column ID2
and agg
ID1
using list
to create a mapping series m
. Then map
the ID2
in df2
with the series m
, finally use boolean indexing to fill the nan
values with []
:
m = df1.explode('ID2').groupby('ID2')['ID1'].agg(list)
df2['ID1'] = df2['ID2'].map(m)
df2.loc[df2['ID1'].isna(), 'ID1'] = [[]] * df2['ID1'].isna().sum()
print(df2)
ID2 TASK ID1
0 5365-BB Check pot [2456-AA, 2457-AA, 2657-AA]
1 6552-BB Check Equip1. [2457-AA, 2458-AA]
2 5858-BB Check Equip2. []
3 2323-BB Check Equip3. [2657-AA]
Upvotes: 1
Reputation: 2514
A lambda function is no good idea here. What you should be doing is recreating dataframes without lists and afterward merge datasets (and at the end, recreate lists if you really need it) ; something like this (I haven't tested it) :
array = df1.values.tolist()
array = [[id1, qt, id2] for [id1_list, qt, id2_list] in array for id1 in id1_list for id2 in id2_list]
df1_altered = pd.DataFrame(array, columns=['ID1', 'QT', 'ID2'])
array = df2.values.tolist()
array = [[task, id2] for [id2_list, task] in array for id2 in id2_list]
df2_altered = pd.DataFrame(array, columns=['Task', 'ID2'])
df = df2_altered.merge(df1_altered, on='ID2', how='left')
df = df.groupby(['ID2', 'Task'])['ID1'].apply(lambda x:[y for y in list(x) if not pd.isnull(y)]).reset_index(drop=False)
df['ID2'] = df['ID2'].apply(lambda x:[x])
Upvotes: 1
Reputation: 11
you'd need to have a "common ID" in both data sets. The ID2 in df2 is fine, but the ID2 in df1 would need some manipulation. You may need to separate the ID2 in df1 into 3 columns, For example: ID1, row 1 [2456-AA, 2457-AA, 2657-AA] into [2456-AA], [2457-AA], [2657-AA]
then perform the left join (similar to Vlookup in Excel)
df2 left join df1 by ID2 (can be done in several languages)
Upvotes: 0