Reputation: 462
I have two dataframes, one with the input info and one with the output:
df_input:
index col1 col2
0 'A' 'B'
1 'B' 'H'
2 'C' 'D'
df_output:
index vectors
0 [[D, 0.5],[E, 0.3]]
1 [[A, 0.3]]
2 [[B, 0.8],[C, 0.5],[H, 0.2]]
The output its a array of arrays. Variable in quantity.
What I need is map the index and append every vector in a row, like this:
df:
index col1 col2 val1 val2
0 'A' 'B' 'D' 0.5
1 'A' 'B' 'E' 0.3
2 'B' 'H' 'A' 0.3
3 'C' 'D' 'B' 0.8
4 'C' 'D' 'C' 0.5
5 'C' 'D' 'H' 0.2
the df its very large so im trying to avoid a loop if its possible.
thank you in advance estimates.
Upvotes: 0
Views: 681
Reputation: 12714
Split the list of list into rows using stack function. Then for each row in vectors column, convert it into string and use split function to create two columns va1 and va2. Use concat to join the two dataframes via index column. Drop the column index since it is not needed in the final output.
import pandas as pd
my_dict = {'index':[0,1,2], 'col1':['A','B','C'], 'col2':['B','H','D']}
df_input = pd.DataFrame(my_dict)
my_dict = {'index':[0,1,2],'vectors':[[['D', 0.5],['E', 0.3]],[['A', 0.3]],[['B', 0.8],['C', 0.5],['H', 0.2]]]}
df_output = pd.DataFrame(my_dict)
df_output = df_output.vectors.apply(pd.Series).stack().rename('vectors')
df_output = df_output.to_frame().reset_index(1, drop=True).reset_index()
df_tmp = df_output.vectors.apply(lambda x: ','.join(map(str, x))).str.split(',', expand=True)
df_tmp.columns = ['va1','val2']
df_tmp = pd.concat([df_tmp, df_output['index']], axis=1, sort=False)
df_tmp = df_input.join(df_tmp.set_index('index'), on='index')
df_tmp.reset_index(drop=True).drop(columns=['index'])
Result:
col1 col2 va1 val2
0 A B D 0.5
1 A B E 0.3
2 B H A 0.3
3 C D B 0.8
4 C D C 0.5
5 C D H 0.2
Upvotes: 0
Reputation: 153570
Where:
input_vectors = pd.DataFrame({'vectors':[[['D', .5],['E',.3]],
[['A',.3]],
[['B',.8],['C',.5],['H',.2]]]})
input_vectors
Output:
vectors
0 [[D, 0.5], [E, 0.3]]
1 [[A, 0.3]]
2 [[B, 0.8], [C, 0.5], [H, 0.2]]
and
df_input
Output:
index col1 col2
0 0 A B
1 1 B H
2 2 C D
Use:
pd.concat([pd.DataFrame(x, index=[i]*len(x))
for i, x in input_vectors.itertuples()])\
.join(df_input)
Output:
0 1 index col1 col2
0 D 0.5 0 A B
0 E 0.3 0 A B
1 A 0.3 1 B H
2 B 0.8 2 C D
2 C 0.5 2 C D
2 H 0.2 2 C D
Upvotes: 2