Andrés Bustamante
Andrés Bustamante

Reputation: 462

Create dataframe mapping a list of arrays

I have two dataframes, one with the input info and one with the output:

df_input:
index col1 col2
 0    'A'  'B'
 1    'B'  'H'
 2    'C'  'D'

df_output:
index vectors
 0    [[D, 0.5],[E, 0.3]]
 1    [[A, 0.3]]
 2    [[B, 0.8],[C, 0.5],[H, 0.2]]

The output its a array of arrays. Variable in quantity.

What I need is map the index and append every vector in a row, like this:

df:
index col1 col2 val1 val2
 0    'A'  'B'  'D'  0.5
 1    'A'  'B'  'E'  0.3
 2    'B'  'H'  'A'  0.3
 3    'C'  'D'  'B'  0.8
 4    'C'  'D'  'C'  0.5
 5    'C'  'D'  'H'  0.2

the df its very large so im trying to avoid a loop if its possible.

thank you in advance estimates.

Upvotes: 0

Views: 681

Answers (2)

jose_bacoy
jose_bacoy

Reputation: 12714

Split the list of list into rows using stack function. Then for each row in vectors column, convert it into string and use split function to create two columns va1 and va2. Use concat to join the two dataframes via index column. Drop the column index since it is not needed in the final output.

import pandas as pd
my_dict = {'index':[0,1,2], 'col1':['A','B','C'], 'col2':['B','H','D']}
df_input = pd.DataFrame(my_dict)
my_dict = {'index':[0,1,2],'vectors':[[['D', 0.5],['E', 0.3]],[['A', 0.3]],[['B', 0.8],['C', 0.5],['H', 0.2]]]}
df_output = pd.DataFrame(my_dict)

df_output = df_output.vectors.apply(pd.Series).stack().rename('vectors')
df_output = df_output.to_frame().reset_index(1, drop=True).reset_index()
df_tmp = df_output.vectors.apply(lambda x: ','.join(map(str, x))).str.split(',', expand=True)
df_tmp.columns = ['va1','val2']
df_tmp = pd.concat([df_tmp, df_output['index']], axis=1, sort=False)
df_tmp = df_input.join(df_tmp.set_index('index'), on='index')
df_tmp.reset_index(drop=True).drop(columns=['index'])

Result:

  col1 col2 va1 val2
0   A   B   D   0.5
1   A   B   E   0.3
2   B   H   A   0.3
3   C   D   B   0.8
4   C   D   C   0.5
5   C   D   H   0.2

Upvotes: 0

Scott Boston
Scott Boston

Reputation: 153570

Where:

input_vectors = pd.DataFrame({'vectors':[[['D', .5],['E',.3]],
                                         [['A',.3]],
                                         [['B',.8],['C',.5],['H',.2]]]})
input_vectors

Output:

                          vectors
0            [[D, 0.5], [E, 0.3]]
1                      [[A, 0.3]]
2  [[B, 0.8], [C, 0.5], [H, 0.2]]

and

df_input

Output:

   index col1 col2
0      0    A    B
1      1    B    H
2      2    C    D

Use:

pd.concat([pd.DataFrame(x, index=[i]*len(x)) 
            for i, x in input_vectors.itertuples()])\
  .join(df_input)

Output:

   0    1  index col1 col2
0  D  0.5      0    A    B
0  E  0.3      0    A    B
1  A  0.3      1    B    H
2  B  0.8      2    C    D
2  C  0.5      2    C    D
2  H  0.2      2    C    D

Upvotes: 2

Related Questions