philMarius
philMarius

Reputation: 694

Adding new columns to dataframe from other dataframe according to list of indices in other dataframe

I have two dataframes, each row in dataframe A has a list of indices corresponding to entries in dataframe B and a set of other values. I want to join the two dataframes in a way so that each of the entries in B has the other values in A where the index of the entry in B is in the list of indices in the entry in A.

So far, I have found a way of extracting the rows in B for the list of indices in each row in A but only row-by-row from this answer but then I am not sure where to go from here? Also not sure if there's a better way of doing it with Pandas dynamically as the size of the list of indices can change.

import pandas as pd
import numpy as np

# Inputs
A = pd.DataFrame.from_dict({
    "indices": [[0,1],[2,3],[4,5]],
    "a1": ["a","b","c"],
    "a2": [100,200,300]
})

print(A)
>>    indices a1   a2
>> 0  [0, 1]  a  100
>> 1  [2, 3]  b  200
>> 2  [4, 5]  c  300

B = pd.DataFrame.from_dict({
    "b": [10,20,30,40,50,60]
})

print(B)
>>     b
>> 0  10
>> 1  20
>> 2  30
>> 3  40
>> 4  50
>> 5  60

# This is the desired output
out = pd.DataFrame.from_dict({
    "b": [10,20,30,40,50,60],
    "a1": ["a","a", "b", "b", "c", "c"],
    "a2": [100,100,200,200,300,300]
})

print(out)
>>      b a1   a2
>> 0  10  a  100
>> 1  20  a  100
>> 2  30  b  200
>> 3  40  b  200
>> 4  50  c  300
>> 5  60  c  300

Upvotes: 4

Views: 224

Answers (5)

Jeff
Jeff

Reputation: 251

You can also use melt instead of stack, but it's more complicated as you must drop columns you don't need:

import pandas as pd
import numpy as np

# Inputs
A = pd.DataFrame.from_dict({
    "indices": [[0,1],[2,3],[4,5]],
    "a1": ["a","b","c"],
    "a2": [100,200,300]
})

B = pd.DataFrame.from_dict({
    "b": [10,20,30,40,50,60]
})

AA = pd.concat([A.indices.apply(pd.Series), A], axis=1)
AA.drop(['indices'], axis=1, inplace=True)
print(AA)

   0  1 a1   a2
0  0  1  a  100
1  2  3  b  200
2  4  5  c  300

AA = AA.melt(id_vars=['a1', 'a2'], value_name='val').drop(['variable'], axis=1)
print(AA)

  a1   a2  val
0  a  100    0
1  b  200    2
2  c  300    4
3  a  100    1
4  b  200    3
5  c  300    5

pd.merge(AA.set_index(['val']), B, left_index=True, right_index=True)

Out[8]: 
  a1   a2   b
0  a  100  10
2  b  200  30
4  c  300  50
1  a  100  20
3  b  200  40
5  c  300  60

Upvotes: 2

Ankur Sinha
Ankur Sinha

Reputation: 6669

If you have pandas >=0.25, you can use explode:

C = A.explode('indices')

This gives:

  indices a1   a2
0       0  a  100
0       1  a  100
1       2  b  200
1       3  b  200
2       4  c  300
2       5  c  300

Then do:

output = pd.merge(B, C, left_index = True, right_on = 'indices')
output.index = output.indices.values    
output.drop('indices', axis = 1, inplace = True)

Final Output:

    b a1   a2
0  10  a  100
1  20  a  100
2  30  b  200
3  40  b  200
4  50  c  300
5  60  c  300

Upvotes: 5

braintho
braintho

Reputation: 401

This solution will handle indices of varying lengths.

A = pd.DataFrame.from_dict({
    "indices": [[0,1],[2,3],[4,5]],
    "a1": ["a","b","c"],
    "a2": [100,200,300]
})
A = A.indices.apply(pd.Series) \
    .merge(A, left_index = True, right_index = True) \
    .drop(["indices"], axis = 1)\
    .melt(id_vars = ['a1', 'a2'], value_name = "index")\
    .drop("variable", axis = 1)\
    .dropna()
A = A.set_index('index')
B = pd.DataFrame.from_dict({
    "b": [10,20,30,40,50,60]
})
B
B.merge(A,left_index=True,right_index=True)

Final Output:

    b   a1  a2
0   10  a   100
1   20  a   100
2   30  b   200
3   40  b   200
4   50  c   300
5   60  c   300

Upvotes: 1

iamklaus
iamklaus

Reputation: 3770

using pd.merge

df2 = pd.DataFrame(A.set_index(['a1','a2']).indices)

df = pd.DataFrame(df2.indices.values.tolist(), index=a.index).stack().reset_index().drop('level_2', axis=1).set_index(0)

pd.merge(B,df,left_index=True, right_index=True)

Output

    b a1   a2
0  10  a  100
1  20  a  100
2  30  b  200
3  40  b  200
4  50  c  300
5  60  c  300

Upvotes: 3

Josh Friedlander
Josh Friedlander

Reputation: 11657

Here you go:

helper = A.indices.apply(pd.Series).stack().reset_index(level=1, drop=True)
A = A.reindex(helper.index).drop(columns=['indices'])
A['indices'] = helper
B = B.merge(A, left_index=True, right_on='indices').drop(columns=['indices']).reset_index(drop=True)

Result:

    b   a1  a2
0   10  a   100
1   20  a   100
2   30  b   200
3   40  b   200
4   50  c   300
5   60  c   300

Upvotes: 2

Related Questions