Reputation:
I have two Dataframes that I want to concatenate horizontally, grouping them by the value of a column. From the pandas.pydata website they do:
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=[0, 1, 2, 3])
df4 = pd.DataFrame({'B': ['B2', 'B3', 'B6', 'B7'],
'D': ['D2', 'D3', 'D6', 'D7'],
'F': ['F2', 'F3', 'F6', 'F7']},
index=[2, 3, 6, 7])
df1 =
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
df4 =
B D F
2 B2 D2 F2
3 B3 D3 F3
6 B6 D6 F6
7 B7 D7 F7
result = pd.concat([df1, df4], axis=1, join='inner')
result =
A B C D B D F
2 A2 B2 C2 D2 B2 D2 F2
3 A3 B3 C3 D3 B3 D3 F3
This works, and I'm happy about it. So I'm using this trick to merge 2 dataframes by the value of a certain column, basically I reindex the Dataframe with that column and then I do the concatenation. However values in that column are repeated, so I end with dataframes with repeated indexes:
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=[0, 3, 3, 2])
df4 = pd.DataFrame({'B': ['B2', 'B3', 'B6', 'B7'],
'D': ['D2', 'D3', 'D6', 'D7'],
'F': ['F2', 'F3', 'F6', 'F7']},
index=[2, 3, 6, 7])
df1 =
A B C D
0 A0 B0 C0 D0
3 A1 B1 C1 D1
3 A2 B2 C2 D2
2 A3 B3 C3 D3
df4 =
B D F
2 B2 D2 F2
3 B3 D3 F3
6 B6 D6 F6
7 B7 D7 F7
So I would expect this two dataframes to join, so I will end up with:
result =
A B C D B D F
3 A1 B1 C1 D1 B2 D2 F2
3 A2 B2 C2 D2 B2 D2 F2
2 A3 B3 C3 D3 B3 D3 F3
(Notice that the two rows with index 3 in df1 both join with the row with index 3 in df4) However this doesn't work.
ValueError: Shape of passed values is (7, 5), indices imply (7, 3)
How can I achieve that? f I can avoid merging by index but I can specify a column it would be even better
Upvotes: 1
Views: 53
Reputation: 153460
Another possible solution is to use join
:
df1.join(df4,how='inner', lsuffix='_df1', rsuffix='_df4')
Output:
A B_df1 C D_df1 B_df4 D_df4 F
2 A3 B3 C3 D3 B2 D2 F2
3 A1 B1 C1 D1 B3 D3 F3
3 A2 B2 C2 D2 B3 D3 F3
Upvotes: 0
Reputation: 862511
One possible solution with merge
with matching by index, default how='inner'
should be omit:
result = pd.merge(df1, df4, left_index=True, right_index=True)
print (result)
A B_x C D_x B_y D_y F
2 A3 B3 C3 D3 B2 D2 F2
3 A1 B1 C1 D1 B3 D3 F3
3 A2 B2 C2 D2 B3 D3 F3
It create combination of duplicated matched rows:
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=[0, 3, 3, 3])
df4 = pd.DataFrame({'B': ['B2', 'B3', 'B6', 'B7'],
'D': ['D2', 'D3', 'D6', 'D7'],
'F': ['F2', 'F3', 'F6', 'F7']},
index=[2, 3, 3, 7])
print (df1)
A B C D
0 A0 B0 C0 D0
3 A1 B1 C1 D1
3 A2 B2 C2 D2
3 A3 B3 C3 D3
print (df4)
B D F
2 B2 D2 F2
3 B3 D3 F3
3 B6 D6 F6
7 B7 D7 F7
result = pd.merge(df1, df4, left_index=True, right_index=True)
print (result)
A B_x C D_x B_y D_y F
3 A1 B1 C1 D1 B3 D3 F3
3 A1 B1 C1 D1 B6 D6 F6
3 A2 B2 C2 D2 B3 D3 F3
3 A2 B2 C2 D2 B6 D6 F6
3 A3 B3 C3 D3 B3 D3 F3
3 A3 B3 C3 D3 B6 D6 F6
Upvotes: 1