Reputation: 154
I am trying to do concentration
for four DataFrames
. df
has unique index
and other 3
of them has duplicated values
in index
.
Here's my code:
import pandas as pd
data = {'id':['1','2','3','4','5','6'],
'name':['A1','A2','A2','A1','A3','A3'],
'date_create':['28-01-2020','15-02-2020','15-03-2020',
'25-03-2020','01-04-2020','30-05-2020']}
df = pd.DataFrame(data)
data = {'userid':['1','2','3','3','5','6'],
'likesDate':['28-01-2020','15-02-2020','15-03-2020',
'25-03-2020','01-04-2020','30-05-2020']}
df1 = pd.DataFrame(data)
data = {'userid':['1','1','2','4','6','6'],
'dislikesDate':['8-01-2020','1-02-2020','15-03-2020',
'25-03-2020','01-04-2020','30-05-2020']}
df2 = pd.DataFrame(data)
data = {'userid':['1','2','2','4','4','6'],
'DeleteDate':['28-01-2020','15-02-2020','15-03-2020',
'25-03-2020','05-04-2020','23-05-2020']}
df3 = pd.DataFrame(data)
df.set_index('id', inplace=True)
df1.set_index('userid', inplace=True)
df2.set_index('userid', inplace=True)
df3.set_index('userid', inplace=True)
df_final = pd.concat([df,df1, df2,df3],axis=1)
print(df_final)
I am getting this error:
ValueError: Shape of passed values is (7, 5), indices imply (6, 5)
What,where am I doing wrong?
Upvotes: 0
Views: 1065
Reputation: 29635
IIUC, you can chain the join
instead of using concat
as you have duplicated index values. If you have only 3 dataframes, you can probably write it fully:
df_final = df.join(df1).join(df2).join(df3)
print(df_final.head())
# name date_create likesDate dislikesDate DeleteDate
# 1 A1 28-01-2020 28-01-2020 8-01-2020 28-01-2020
# 1 A1 28-01-2020 28-01-2020 1-02-2020 28-01-2020
# 2 A2 15-02-2020 15-02-2020 15-03-2020 15-02-2020
# 2 A2 15-02-2020 15-02-2020 15-03-2020 15-03-2020
# 3 A2 15-03-2020 15-03-2020 NaN NaN
but in case you want a more flexible solution that can accept more dataframes, then use reduce
from functools import reduce
df_final = (
reduce(lambda x,y:x.join(y), [df,df1, df2,df3]) # same a chaining joins
.rename_axis('user_id').reset_index() # user id back as a column if needed
)
print(df_final)
# user_id name date_create likesDate dislikesDate DeleteDate
# 0 1 A1 28-01-2020 28-01-2020 8-01-2020 28-01-2020
# 1 1 A1 28-01-2020 28-01-2020 1-02-2020 28-01-2020
# 2 2 A2 15-02-2020 15-02-2020 15-03-2020 15-02-2020
# 3 2 A2 15-02-2020 15-02-2020 15-03-2020 15-03-2020
# 4 3 A2 15-03-2020 15-03-2020 NaN NaN
# 5 3 A2 15-03-2020 25-03-2020 NaN NaN
# 6 4 A1 25-03-2020 NaN 25-03-2020 25-03-2020
# 7 4 A1 25-03-2020 NaN 25-03-2020 05-04-2020
# 8 5 A3 01-04-2020 01-04-2020 NaN NaN
# 9 6 A3 30-05-2020 30-05-2020 01-04-2020 23-05-2020
# 10 6 A3 30-05-2020 30-05-2020 30-05-2020 23-05-2020
Upvotes: 1