merge two DataFrame with two columns and keep the same order with original indexes in the result

Question

I have two pandas data frames. Both data frames have two key columns and one value column for merge. I want to keep the same order with original indexes in the merged result.

The keys and values might be missing or changed in the other data frame.
The order of data are important. You can't sort them by the keys or values in the merged result.

It should looks like this:

df1_index / df2_index / results are just used for demonstration.

I tried to use merge with outer:

df1 = pd.DataFrame({
    "key1": ['K', 'K', 'A1', 'A2', 'B1', 'B9', 'C3'],
    "key2": ['a5', 'a4', 'a7', 'a9', 'b2', 'b8', 'c1'],
    "Value1": ['apple', 'guava', 'kiwi', 'grape', 'banana', 'peach', 'berry'],
})

df2 = pd.DataFrame({
    "key1": ['K', 'A1', 'A3', 'B1', 'C2', 'C3'],
    "key2": ['a9', 'a7', 'a9', 'b2', 'c7', 'c1'],
    "Value2": ['apple', 'kiwi', 'grape', 'banana', 'guava', 'orange'],
})

merged_df = pd.merge(df1, df2, how="outer", on=['key1', 'key2'])

but it just added missing keys in the end of rows:

How do I merge and align them up?

Haleemur Ali · Accepted Answer

when constructing the merged dataframe, get the index values from each dataframe.

merged_df = pd.merge(df1, df2, how="outer", on=['key1', 'key2'])

use combine_first to combine index_x & index_y

merged_df['combined_index'] =merged_df.index_x.combine_first(merged_df.index_y)

sort using combined_index & index_x dropping columns which are not needed & resetting index.

output = merged_df.sort_values(
    ['combined_index', 'index_x']
).drop(
    ['index_x', 'index_y', 'combined_index'], axis=1
).reset_index(drop=True)

This results in the following output:

  key1 key2  Value1  Value2
0    K   a5   apple     NaN
1    K   a9     NaN   apple
2    K   a4   guava     NaN
3   A1   a7    kiwi    kiwi
4   A3   a9     NaN   grape
5   A2   a9   grape     NaN
6   B1   b2  banana  banana
7   C2   c7     NaN   guava
8   B9   b8   peach     NaN
9   C3   c1   berry  orange

merge two DataFrame with two columns and keep the same order with original indexes in the result

Answers (1)

Related Questions