Reputation: 1273
I have two pandas.DataFrame
objects with MultiIndex
indices. Some of the index values are shared with the two dataframes, but not all. I would like to merge these two data frames and take the sum of one of the columns if the row (index value) exists. Otherwise, keep the row and column value as it exists.
: This is close, but does not use MultiIndex
I've tried to create an example:
def mklbl(prefix,n):
try:
return ["%s%s" % (prefix,i) for i in range(n)]
except:
return ["%s%s" % (prefix,i) for i in n]
mi1 = pd.MultiIndex.from_product([mklbl('A',4), mklbl('C',2)])
mi2 = pd.MultiIndex.from_product([mklbl('A',[2,3,4]), mklbl('C',2)])
df2 = pd.DataFrame({'b':np.arange(len(mi2)), 'c':np.arange(len(mi2))[::-1]},
index=mi2).sort_index().sort_index(axis=1)
df1 = pd.DataFrame({'a':np.arange(len(mi1)), 'b':np.arange(len(mi1))[::-1]},
index=mi1).sort_index().sort_index(axis=1)
The individual DataFrame
objects look like:
In [117]: df1
Out[117]:
a b
A0 C0 0 7
C1 1 6
A1 C0 2 5
C1 3 4
A2 C0 4 3
C1 5 2
A3 C0 6 1
C1 7 0
and
In [118]: df2
Out[118]:
b c
A2 C0 0 5
C1 1 4
A3 C0 2 3
C1 3 2
A4 C0 4 1
C1 5 0
What I want to do is merge these two, and sum the 'b' column, but keep all rows whether they exist in one or the other dataframe:
In [117]: df_merged_bsummed
Out[117]:
a b c
A0 C0 0 7 NaN
C1 1 6 NaN
A1 C0 2 5 NaN
C1 3 4 NaN
A2 C0 4 3 5
C1 5 3 4
A3 C0 6 3 3
C1 7 3 2
A4 C0 NaN 4 1
C1 NaN 5 0
Upvotes: 4
Views: 7547
Reputation: 353059
In this particular case, I think you could just add them and use fill_value=0
, relying on the default alignment behaviour:
>>> df1.add(df2,fill_value=0)
a b c
A0 C0 0 7 NaN
C1 1 6 NaN
A1 C0 2 5 NaN
C1 3 4 NaN
A2 C0 4 3 5
C1 5 3 4
A3 C0 6 3 3
C1 7 3 2
A4 C0 NaN 4 1
C1 NaN 5 0
There being only one column in common, only one is summed, but if you wanted to make that explicit you could instead do something like
>>> m = pd.concat([df1, df2],axis=1)
>>> m["b"] = m.pop("b").sum(axis=1)
>>> m
a c b
A0 C0 0 NaN 7
C1 1 NaN 6
A1 C0 2 NaN 5
C1 3 NaN 4
A2 C0 4 5 3
C1 5 4 3
A3 C0 6 3 3
C1 7 2 3
A4 C0 NaN 1 4
C1 NaN 0 5
Upvotes: 8