Reputation: 515
I have two DataFrames
df1 has following form
ID col1 col2
0 1 2 10
1 3 1 21
and df2 looks like this
ID field1 field2
0 1 4 1
1 1 3 3
2 3 5 4
3 3 9 5
4 1 2 0
I want to concatenate both DataFrames but so that I have only one line per each ID, so it'd look like this:
ID col1 col2 field1_1 field2_1 field1_2 field2_2 field1_3 field2_3
0 1 2 10 4 1 3 3 2 0
1 3 1 21 5 4 9 5
I have tried merging and pivoting the data df.pivot(index=df1.index, columns='ID')
But because the length is variable, I become a ValueError.
ValueError: all arrays must be same length
Upvotes: 1
Views: 146
Reputation: 294488
Without over formatting, we want to merge and add a level of a multi index that counts the 'ID'
s.
df = df1.merge(df2)
cc = df.groupby('ID').cumcount()
df.set_index(['ID', 'col1', 'col2', cc]).unstack()
field1 field2
0 1 2 0 1 2
ID col1 col2
1 2 10 4.0 3.0 2.0 1.0 3.0 0.0
3 1 21 5.0 9.0 NaN 4.0 5.0 NaN
We can nail down the formatting with:
df = df1.merge(df2)
cc = df.groupby('ID').cumcount() + 1
d1 = df.set_index(['ID', 'col1', 'col2', cc]).unstack().sort_index(axis=1, level=1)
d1.columns = d1.columns.to_series().map('{0[0]}_{0[1]}'.format)
d1.reset_index()
ID col1 col2 field1_1 field2_1 field1_2 field2_2 field1_3 field2_3
0 1 2 10 4.0 1.0 3.0 3.0 2.0 0.0
1 3 1 21 5.0 4.0 9.0 5.0 NaN NaN
Upvotes: 1