Qubix
Qubix

Reputation: 4353

Combine multiple dataframes by summing certain columns in Pandas

Given three dataframes:

df1 = pd.DataFrame({'A': [5, 0], 'B': [2, 4], 'C': 'dog'})
df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3], 'C': 'dog'})
df3 = pd.DataFrame({'A': [2, 1], 'B': [5, 1], 'C': 'dog'})

how can one combine them into a single dataframe, by adding the values of a subset of given dataframes, such that the result becomes:

pd.DataFrame({'A': [8, 2], 'B': [10, 8], 'C': 'dog'})

for this example? My problem is that I also have columns which are identical, but cannot be summed (like 'C' here).

Upvotes: 3

Views: 746

Answers (2)

Yonas Kassa
Yonas Kassa

Reputation: 3690

you can do it as follows:

df = df3.copy()
df[['A','B']] = df1[['A','B']]+df2[['A','B']]+df3[['A','B']]

gives the following output, if you want you can:

:df

    A   B   C
0   8   10  dog
1   2   8   dog

Upvotes: 0

jezrael
jezrael

Reputation: 862641

One possible solution with sum if numeric values and if strings then join unique values per groups in GroupBy.agg after concat list of DataFrames:

f = lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else ','.join(x.unique())
df = pd.concat([df1, df2, df3], keys=range(3)).groupby(level=1).agg(f)
print (df)
   A   B    C
0  8  10  dog
1  2   8  dog

If possible different values like cat and dog:

df1 = pd.DataFrame({'A': [5, 0], 'B': [2, 4], 'C': 'dog'})
df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3], 'C': 'dog'})
df3 = pd.DataFrame({'A': [2, 1], 'B': [5, 1], 'C': ['cat','dog']})


f = lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else ','.join(x.unique())
df = pd.concat([df1, df2, df3], keys=range(3)).groupby(level=1).agg(f)
print (df)
   A   B        C
0  8  10  dog,cat
1  2   8      dog

If need lists:

f = lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else x.unique().tolist()
df = pd.concat([df1, df2, df3], keys=range(3)).groupby(level=1).agg(f)
print (df)
   A   B           C
0  8  10  [dog, cat]
1  2   8       [dog]

And for combination lists with scalars for nonnumeric values use custom function:

def f(x):
    if np.issubdtype(x.dtype, np.number):
        return x.sum()
    else:
        u = x.unique().tolist()
        if len(u) == 1:
            return u[0]
        else:
            return u

df = pd.concat([df1, df2, df3], keys=range(3)).groupby(level=1).agg(f)
print (df)
   A   B           C
0  8  10  [dog, cat]
1  2   8         dog

Upvotes: 4

Related Questions