Reputation: 4353
Given three dataframes:
df1 = pd.DataFrame({'A': [5, 0], 'B': [2, 4], 'C': 'dog'})
df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3], 'C': 'dog'})
df3 = pd.DataFrame({'A': [2, 1], 'B': [5, 1], 'C': 'dog'})
how can one combine them into a single dataframe, by adding the values of a subset of given dataframes, such that the result becomes:
pd.DataFrame({'A': [8, 2], 'B': [10, 8], 'C': 'dog'})
for this example? My problem is that I also have columns which are identical, but cannot be summed (like 'C'
here).
Upvotes: 3
Views: 746
Reputation: 3690
you can do it as follows:
df = df3.copy()
df[['A','B']] = df1[['A','B']]+df2[['A','B']]+df3[['A','B']]
gives the following output, if you want you can:
:df
A B C
0 8 10 dog
1 2 8 dog
Upvotes: 0
Reputation: 862641
One possible solution with sum
if numeric values and if strings then join unique values per groups in GroupBy.agg
after concat
list of DataFrame
s:
f = lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else ','.join(x.unique())
df = pd.concat([df1, df2, df3], keys=range(3)).groupby(level=1).agg(f)
print (df)
A B C
0 8 10 dog
1 2 8 dog
If possible different values like cat
and dog
:
df1 = pd.DataFrame({'A': [5, 0], 'B': [2, 4], 'C': 'dog'})
df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3], 'C': 'dog'})
df3 = pd.DataFrame({'A': [2, 1], 'B': [5, 1], 'C': ['cat','dog']})
f = lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else ','.join(x.unique())
df = pd.concat([df1, df2, df3], keys=range(3)).groupby(level=1).agg(f)
print (df)
A B C
0 8 10 dog,cat
1 2 8 dog
If need lists:
f = lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else x.unique().tolist()
df = pd.concat([df1, df2, df3], keys=range(3)).groupby(level=1).agg(f)
print (df)
A B C
0 8 10 [dog, cat]
1 2 8 [dog]
And for combination lists with scalars for nonnumeric values use custom function:
def f(x):
if np.issubdtype(x.dtype, np.number):
return x.sum()
else:
u = x.unique().tolist()
if len(u) == 1:
return u[0]
else:
return u
df = pd.concat([df1, df2, df3], keys=range(3)).groupby(level=1).agg(f)
print (df)
A B C
0 8 10 [dog, cat]
1 2 8 dog
Upvotes: 4