Reputation: 694
I have two dataframes, each row in dataframe A
has a list of indices corresponding to entries in dataframe B
and a set of other values. I want to join the two dataframes in a way so that each of the entries in B
has the other values in A
where the index of the entry in B
is in the list of indices in the entry in A
.
So far, I have found a way of extracting the rows in B
for the list of indices in each row in A
but only row-by-row from this answer but then I am not sure where to go from here? Also not sure if there's a better way of doing it with Pandas dynamically as the size of the list of indices can change.
import pandas as pd
import numpy as np
# Inputs
A = pd.DataFrame.from_dict({
"indices": [[0,1],[2,3],[4,5]],
"a1": ["a","b","c"],
"a2": [100,200,300]
})
print(A)
>> indices a1 a2
>> 0 [0, 1] a 100
>> 1 [2, 3] b 200
>> 2 [4, 5] c 300
B = pd.DataFrame.from_dict({
"b": [10,20,30,40,50,60]
})
print(B)
>> b
>> 0 10
>> 1 20
>> 2 30
>> 3 40
>> 4 50
>> 5 60
# This is the desired output
out = pd.DataFrame.from_dict({
"b": [10,20,30,40,50,60],
"a1": ["a","a", "b", "b", "c", "c"],
"a2": [100,100,200,200,300,300]
})
print(out)
>> b a1 a2
>> 0 10 a 100
>> 1 20 a 100
>> 2 30 b 200
>> 3 40 b 200
>> 4 50 c 300
>> 5 60 c 300
Upvotes: 4
Views: 224
Reputation: 251
You can also use melt instead of stack, but it's more complicated as you must drop columns you don't need:
import pandas as pd
import numpy as np
# Inputs
A = pd.DataFrame.from_dict({
"indices": [[0,1],[2,3],[4,5]],
"a1": ["a","b","c"],
"a2": [100,200,300]
})
B = pd.DataFrame.from_dict({
"b": [10,20,30,40,50,60]
})
AA = pd.concat([A.indices.apply(pd.Series), A], axis=1)
AA.drop(['indices'], axis=1, inplace=True)
print(AA)
0 1 a1 a2
0 0 1 a 100
1 2 3 b 200
2 4 5 c 300
AA = AA.melt(id_vars=['a1', 'a2'], value_name='val').drop(['variable'], axis=1)
print(AA)
a1 a2 val
0 a 100 0
1 b 200 2
2 c 300 4
3 a 100 1
4 b 200 3
5 c 300 5
pd.merge(AA.set_index(['val']), B, left_index=True, right_index=True)
Out[8]:
a1 a2 b
0 a 100 10
2 b 200 30
4 c 300 50
1 a 100 20
3 b 200 40
5 c 300 60
Upvotes: 2
Reputation: 6669
If you have pandas >=0.25, you can use explode:
C = A.explode('indices')
This gives:
indices a1 a2
0 0 a 100
0 1 a 100
1 2 b 200
1 3 b 200
2 4 c 300
2 5 c 300
Then do:
output = pd.merge(B, C, left_index = True, right_on = 'indices')
output.index = output.indices.values
output.drop('indices', axis = 1, inplace = True)
Final Output:
b a1 a2
0 10 a 100
1 20 a 100
2 30 b 200
3 40 b 200
4 50 c 300
5 60 c 300
Upvotes: 5
Reputation: 401
This solution will handle indices of varying lengths.
A = pd.DataFrame.from_dict({
"indices": [[0,1],[2,3],[4,5]],
"a1": ["a","b","c"],
"a2": [100,200,300]
})
A = A.indices.apply(pd.Series) \
.merge(A, left_index = True, right_index = True) \
.drop(["indices"], axis = 1)\
.melt(id_vars = ['a1', 'a2'], value_name = "index")\
.drop("variable", axis = 1)\
.dropna()
A = A.set_index('index')
B = pd.DataFrame.from_dict({
"b": [10,20,30,40,50,60]
})
B
B.merge(A,left_index=True,right_index=True)
Final Output:
b a1 a2
0 10 a 100
1 20 a 100
2 30 b 200
3 40 b 200
4 50 c 300
5 60 c 300
Upvotes: 1
Reputation: 3770
using pd.merge
df2 = pd.DataFrame(A.set_index(['a1','a2']).indices)
df = pd.DataFrame(df2.indices.values.tolist(), index=a.index).stack().reset_index().drop('level_2', axis=1).set_index(0)
pd.merge(B,df,left_index=True, right_index=True)
Output
b a1 a2
0 10 a 100
1 20 a 100
2 30 b 200
3 40 b 200
4 50 c 300
5 60 c 300
Upvotes: 3
Reputation: 11657
Here you go:
helper = A.indices.apply(pd.Series).stack().reset_index(level=1, drop=True)
A = A.reindex(helper.index).drop(columns=['indices'])
A['indices'] = helper
B = B.merge(A, left_index=True, right_on='indices').drop(columns=['indices']).reset_index(drop=True)
Result:
b a1 a2
0 10 a 100
1 20 a 100
2 30 b 200
3 40 b 200
4 50 c 300
5 60 c 300
Upvotes: 2