Reputation: 3387
I have two pandas data frames. Both data frames have two key columns and one value column for merge. I want to keep the same order with original indexes in the merged result.
It should looks like this:
df1_index
/df2_index
/results
are just used for demonstration.
I tried to use merge
with outer
:
df1 = pd.DataFrame({
"key1": ['K', 'K', 'A1', 'A2', 'B1', 'B9', 'C3'],
"key2": ['a5', 'a4', 'a7', 'a9', 'b2', 'b8', 'c1'],
"Value1": ['apple', 'guava', 'kiwi', 'grape', 'banana', 'peach', 'berry'],
})
df2 = pd.DataFrame({
"key1": ['K', 'A1', 'A3', 'B1', 'C2', 'C3'],
"key2": ['a9', 'a7', 'a9', 'b2', 'c7', 'c1'],
"Value2": ['apple', 'kiwi', 'grape', 'banana', 'guava', 'orange'],
})
merged_df = pd.merge(df1, df2, how="outer", on=['key1', 'key2'])
but it just added missing keys in the end of rows:
How do I merge and align them up?
Upvotes: 2
Views: 1269
Reputation: 28253
when constructing the merged dataframe, get the index values from each dataframe.
merged_df = pd.merge(df1, df2, how="outer", on=['key1', 'key2'])
use combine_first
to combine index_x
& index_y
merged_df['combined_index'] =merged_df.index_x.combine_first(merged_df.index_y)
sort using combined_index
& index_x
dropping columns which are not needed & resetting index.
output = merged_df.sort_values(
['combined_index', 'index_x']
).drop(
['index_x', 'index_y', 'combined_index'], axis=1
).reset_index(drop=True)
This results in the following output:
key1 key2 Value1 Value2
0 K a5 apple NaN
1 K a9 NaN apple
2 K a4 guava NaN
3 A1 a7 kiwi kiwi
4 A3 a9 NaN grape
5 A2 a9 grape NaN
6 B1 b2 banana banana
7 C2 c7 NaN guava
8 B9 b8 peach NaN
9 C3 c1 berry orange
Upvotes: 1