Reputation: 8905
I saw many post talk about use reduce to merge multiple dataframe. I try below code, it works with three dataframes, if it greater than three, tan, error reported:
pandas.errors.MergeError: Passing 'suffixes' which cause duplicate columns {'value_x'} is not allowed.
code:
import pandas as pd
from functools import reduce
data1 = [
['A',1],
['B',2],
['C',3]]
data2 = [
['A',4],
['B',5],
['C',6]]
data3 = [
['A',7],
['B',8],
['C',9]]
data4 = [
['A',10],
['B',11],
['C',12]]
dfs = []
for data in [data1, data2, data3, data4]:
df = pd.DataFrame(data, columns=['name', 'value'])
dfs.append(df)
dfm = reduce(
lambda left, right: pd.merge(
left, right,
on=['name'],
how='outer',
suffixes=('_x', '_y')),
dfs)
print(dfm)
What's proper way to fix this error?
Upvotes: 0
Views: 72
Reputation: 25418
Another approach, which renames the value
column as the list of dfs
is being created. Afterwards, just use reduce
:
# creating the list of dfs and renaming the 'value' column
dfs = []
for idx, data in enumerate([data1,data2,data3,data4]):
df = pd.DataFrame(data,columns=['name',f'value_{idx+1}'])
dfs.append(df)
reduce(lambda left, right: pd.merge(left, right, on='name', how='outer'), dfs)
Alternatively, we can rename column value
inside reduce
, by using list comprehension
:
reduce(
lambda left, right: pd.merge(left, right, on='name', how='outer'),
# renaming columns of dfs
[x.rename(columns={'value': f'value_{idx+1}'}) for idx, x in enumerate(dfs)])
Output:
name value_1 value_2 value_3 value_4
0 A 1 4 7 10
1 B 2 5 8 11
2 C 3 6 9 12
Upvotes: 0
Reputation: 262224
The problem is that after the third iteration, the left dataframe contains value_x
and value
as columns, then right one contains value
, but adding the _x
suffix would cause value
to become value_x
that is already existing.
You should rather use concat
here:
out = (pd.concat([d.set_index('name').add_suffix(f'_{i}')
for i, d in enumerate(dfs, start=1)],
axis=1)
.reset_index()
)
Output:
name value_1 value_2 value_3 value_4
0 A 1 4 7 10
1 B 2 5 8 11
2 C 3 6 9 12
To achieve the same with reduce
would be a bit convoluted:
dfm = reduce(lambda left, right:
pd.merge(left, right[1], on=['name'],
how='outer', suffixes=(None, f'_{right[0]}')),
list(enumerate(dfs[1:], start=2)),
dfs[0])
Output:
name value value_2 value_3 value_4
0 A 1 4 7 10
1 B 2 5 8 11
2 C 3 6 9 12
Upvotes: 0