Jivan
Jivan

Reputation: 23078

Concat two DataFrames on missing indices

I have two DataFrames and want to use the second one only on the rows whose index is not already contained in the first one.

What is the most efficient way to do this?

Example:

df_1
idx     val
0      0.32
1      0.54
4      0.26
5      0.76
7      0.23

df_2
idx     val
1     10.24
2     10.90
3     10.66
4     10.25
6     10.13
7     10.52

df_final
idx     val
0      0.32
1      0.54
2     10.90
3     10.66
4      0.26
5      0.76
6     10.13
7      0.23

Recap: I need to add the rows in df_2 for which the index is not already in df_1.


EDIT

Removed some indices in df_2 to illustrate the fact that all indices from df_1 are not covered in df_2.

Upvotes: 6

Views: 1946

Answers (2)

sodd
sodd

Reputation: 12923

You can achieve the wanted output by using the combine_first method of the DataFrame. From the documentation of the method:

Combine two DataFrame objects and default to non-null values in frame calling the method. Result index columns will be the union of the respective indexes and columns

Example usage:

import pandas as pd

df_1 = pd.DataFrame([0.32,0.54,0.26,0.76,0.23], columns=['val'], index=[0,1,4,5,7])
df_1.index.name = 'idx'

df_2 = pd.DataFrame([10.56,10.24,10.90,10.66,10.25,10.13,10.52], columns=['val'], index=[0,1,2,3,4,6,7])
df_2.index.name = 'idx'

df_final = df_1.combine_first(df_2)

This will give the desired result:

In [7]: df_final
Out[7]:
       val
idx       
0     0.32
1     0.54
2    10.90
3    10.66
4     0.26
5     0.76
6    10.13
7     0.23

Upvotes: 2

jezrael
jezrael

Reputation: 862921

You can use reindex with combine_first or fillna:

df = df_1.reindex(df_2.index).combine_first(df_2)
print (df)
       val
idx       
0     0.32
1     0.54
2    10.90
3    10.66
4     0.26
5     0.76
6    10.13
7     0.23

df = df_1.reindex(df_2.index).fillna(df_2)
print (df)
       val
idx       
0     0.32
1     0.54
2    10.90
3    10.66
4     0.26
5     0.76
6    10.13
7     0.23

Upvotes: 4

Related Questions