Concat two DataFrames on missing indices

Question

I have two DataFrames and want to use the second one only on the rows whose index is not already contained in the first one.

What is the most efficient way to do this?

Example:

df_1
idx     val
0      0.32
1      0.54
4      0.26
5      0.76
7      0.23

df_2
idx     val
1     10.24
2     10.90
3     10.66
4     10.25
6     10.13
7     10.52

df_final
idx     val
0      0.32
1      0.54
2     10.90
3     10.66
4      0.26
5      0.76
6     10.13
7      0.23

Recap: I need to add the rows in df_2 for which the index is not already in df_1.

EDIT

Removed some indices in df_2 to illustrate the fact that all indices from df_1 are not covered in df_2.

sodd · Accepted Answer

You can achieve the wanted output by using the combine_first method of the DataFrame. From the documentation of the method:

Combine two DataFrame objects and default to non-null values in frame calling the method. Result index columns will be the union of the respective indexes and columns

Example usage:

import pandas as pd

df_1 = pd.DataFrame([0.32,0.54,0.26,0.76,0.23], columns=['val'], index=[0,1,4,5,7])
df_1.index.name = 'idx'

df_2 = pd.DataFrame([10.56,10.24,10.90,10.66,10.25,10.13,10.52], columns=['val'], index=[0,1,2,3,4,6,7])
df_2.index.name = 'idx'

df_final = df_1.combine_first(df_2)

This will give the desired result:

In [7]: df_final
Out[7]:
       val
idx       
0     0.32
1     0.54
2    10.90
3    10.66
4     0.26
5     0.76
6    10.13
7     0.23

Concat two DataFrames on missing indices

Answers (2)

Related Questions