Merging and Filling in Pandas DataFrames

Question

I have two dataframes in Pandas. The columns are named the same and they have the same dimensions, but they have different (and missing) values.

I would like to merge based on one key column and take the max or non-missing data for each equivalent row.

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'key':[1,3,5,7], 'a':[np.NaN, 0, 5, 1], 'b':[datetime.datetime.today() - datetime.timedelta(days=x) for x in range(0,4)]})
df1
    a                          b  key
0 NaN 2014-08-01 10:37:23.828683    1
1   0 2014-07-31 10:37:23.828726    3
2   5 2014-07-30 10:37:23.828736    5
3   1 2014-07-29 10:37:23.828744    7

df2 = pd.DataFrame({'key':[1,3,5,7], 'a':[2, 0, np.NaN, 3], 'b':[datetime.datetime.today() - datetime.timedelta(days=x) for x in range(2,6)]})
df2.ix[2,'b']=np.NaN
df2
    a                          b  key
0   2 2014-07-30 10:38:13.857203    1
1   0 2014-07-29 10:38:13.857253    3
2 NaN                        NaT    5
3   3 2014-07-27 10:38:13.857272    7

The end result would look like:

df_together
    a                          b  key
0   2 2014-07-30 10:38:13.857203    1
1   0 2014-07-29 10:38:13.857253    3
2   5 2014-07-30 10:37:23.828736    5
3   3 2014-07-27 10:38:13.857272    7

I hope my example covers all cases. If both dataframes have NaN (or NaT) values, they the result should also have NaN (or NaT) values. Try as I might, I can't get the pd.merge function to give what I want.

U2EF1 · Accepted Answer

Often it is easiest in these circumstances to do:

df_together = pd.concat([df1, df2]).groupby('key').max()

Merging and Filling in Pandas DataFrames

Answers (1)

Related Questions