Bart
Bart

Reputation: 10248

Pandas DataFrame merge resets index

I'm trying to merge two Pandas DataFrames, where (possibly) there are some duplicate records. The merge works as expected, but unfortunately, it seems to reset the index. Consider this simple example:

import pandas as pd
import numpy as np

index = np.arange(10,15)
data  = np.arange(10,15)
df1 = pd.DataFrame({'data':data}, index=index)

index = np.arange(12,17)
data  = np.arange(12,17)
df2 = pd.DataFrame({'data':data}, index=index)

df3 = df1.merge(df2, how='outer')

This results in:

# df1:       df2:         df3: 
# ------     # ------     # ------
    data         data         data
10    10     12    12     0    10
11    11     13    13     1    11
12    12     14    14     2    12
13    13     15    15     3    13
14    14     16    16     4    14
                          5    15 
                          6    16

So the indexes in df1 (10-14) and df2 (12-16) have been replaced by (0-6) in the merged df3. Is there a logical explanation for this behavior? Why aren't the indexes in df3 (10-16)?

The only workaround seems to be to reset the indexes in df1 and df2 with .reset_index(), do the merge, and setting the index again in df3 with df3.set_index('index'), which does result in:

In [97]: df3.index
Out[97]: Int64Index([10, 11, 12, 13, 14, 15, 16], dtype='int64', name='index')

Is there a way to get to this result, without resetting and re-resetting the indexes?

Upvotes: 2

Views: 3811

Answers (2)

nslamberth
nslamberth

Reputation: 1217

You're looking for concat rather than merge. Check the below:

In [13]: pd.concat([df1, df2], axis=0)
Out[13]: 
    data
10    10
11    11
12    12
13    13
14    14
12    12
13    13
14    14
15    15
16    16

Upvotes: 3

Roelant
Roelant

Reputation: 5119

I guess I would use

 df3.index = df3['data'].values

Upvotes: 0

Related Questions