Reputation: 2929
I have multiple (3 or more) dataframes which I need to merge.
Example df1:
| clicks_US | dayofyear | weekday
2020-03-15 | 15000 | 75 | Sunday
2020-03-16 | 12000 | 76 | Monday
2020-03-17 | 10000 | 77 | Tuesday
Example df2:
| clicks_UK | dayofyear | weekday
2020-03-15 | 13000 | 75 | Sunday
2020-03-16 | 9000 | 76 | Monday
2020-03-17 | 8000 | 77 | Tuesday
Example df3:
| clicks_NZ | dayofyear | weekday
2020-03-15 | 7000 | 75 | Sunday
2020-03-16 | 5000 | 76 | Monday
2020-03-17 | 1000 | 77 | Tuesday
Desired output:
| clicks_US | clicks_UK |clicks_NZ | dayofyear | weekday
2020-03-15 | 15000 | 13000 | 7000 | 75 | Sunday
2020-03-16 | 12000 | 9000 | 5000 | 76 | Monday
2020-03-17 | 10000 | 8000 | 1000 | 77 | Tuesday
But the number of dfs to merge can be more sometimes.
Column I want to merge is the index one, datetime with ISO 8601 format.
Because I have a varying number of dfs I need to merge each time I searched for a flexible method but didn't find oney yet.
Is there an easy method to define a list with the different dfs and just call
dfs = [df1, df2, df3, df4]
pd.merge(dfs, how="inner")
without having to chain for each df so that I can keep the number flexible?
Upvotes: 0
Views: 100
Reputation: 28709
You can set the index to include dayofyear
and weekday
, as I assume from the data you shared that date is already an index. With the index set, You can merge multiple dataframes with a join
:
#create example data df1 as you shared in your question :
#same format for df2 and df3
data = """date_index clicks_US dayofyear weekday
2020-03-15 15000 75 Sunday
2020-03-16 12000 76 Monday
2020-03-17 10000 77 Tuesday"""
df1 = pd.read_csv(StringIO(data), sep='\s{2,}', engine='python', index_col=0)
df1
#columns to append to current index
cols = ['dayofyear','weekday']
#set indexes
first, *others = [frame.set_index(cols,append=True)
for frame in (df1,df2,df3)]
#join dataframes
outcome = first.join(others).reset_index()
outcome
date_index dayofyear weekday clicks_US clicks_UK clicks_NZ
0 2020-03-15 75 Sunday 15000 13000 7000
1 2020-03-16 76 Monday 12000 9000 5000
2 2020-03-17 77 Tuesday 10000 8000 1000
This can be extended to any number of dataframes
Upvotes: 0
Reputation: 7604
You can do it in 2 steps:
dfs = [df1, df2, df3]
df = pd.concat(dfs, axis=1)
df = df.loc[:,~df.columns.duplicated()]
print(df)
clicks_US dayofyear weekday clicks_UK clicks_NZ
Date
2020-03-15 15000 75 Sunday 13000 7000
2020-03-16 12000 76 Monday 9000 5000
2020-03-17 10000 77 Tuesday 8000 1000
Upvotes: 1