Reputation: 10248
I'm trying to merge two Pandas DataFrames
, where (possibly) there are some duplicate records. The merge works as expected, but unfortunately, it seems to reset the index. Consider this simple example:
import pandas as pd
import numpy as np
index = np.arange(10,15)
data = np.arange(10,15)
df1 = pd.DataFrame({'data':data}, index=index)
index = np.arange(12,17)
data = np.arange(12,17)
df2 = pd.DataFrame({'data':data}, index=index)
df3 = df1.merge(df2, how='outer')
This results in:
# df1: df2: df3:
# ------ # ------ # ------
data data data
10 10 12 12 0 10
11 11 13 13 1 11
12 12 14 14 2 12
13 13 15 15 3 13
14 14 16 16 4 14
5 15
6 16
So the indexes in df1
(10-14) and df2
(12-16) have been replaced by (0-6) in the merged df3
. Is there a logical explanation for this behavior? Why aren't the indexes in df3
(10-16)?
The only workaround seems to be to reset the indexes in df1
and df2
with .reset_index()
, do the merge, and setting the index again in df3
with df3.set_index('index')
, which does result in:
In [97]: df3.index
Out[97]: Int64Index([10, 11, 12, 13, 14, 15, 16], dtype='int64', name='index')
Is there a way to get to this result, without resetting and re-resetting the indexes?
Upvotes: 2
Views: 3811
Reputation: 1217
You're looking for concat
rather than merge
. Check the below:
In [13]: pd.concat([df1, df2], axis=0)
Out[13]:
data
10 10
11 11
12 12
13 13
14 14
12 12
13 13
14 14
15 15
16 16
Upvotes: 3