Vega
Vega

Reputation: 2929

How to merge a variable number of dataframes on the same index

I have multiple (3 or more) dataframes which I need to merge.

Example df1:

           | clicks_US | dayofyear | weekday
2020-03-15 | 15000     | 75        | Sunday
2020-03-16 | 12000     | 76        | Monday
2020-03-17 | 10000     | 77        | Tuesday

Example df2:

           | clicks_UK | dayofyear | weekday
2020-03-15 | 13000     | 75        | Sunday
2020-03-16 | 9000      | 76        | Monday
2020-03-17 | 8000      | 77        | Tuesday

Example df3:

           | clicks_NZ | dayofyear | weekday
2020-03-15 | 7000      | 75        | Sunday
2020-03-16 | 5000      | 76        | Monday
2020-03-17 | 1000      | 77        | Tuesday

Desired output:

           | clicks_US | clicks_UK |clicks_NZ | dayofyear | weekday
2020-03-15 | 15000     | 13000     | 7000     | 75        | Sunday
2020-03-16 | 12000     | 9000      | 5000     | 76        | Monday
2020-03-17 | 10000     | 8000      | 1000     | 77        | Tuesday

But the number of dfs to merge can be more sometimes.

Column I want to merge is the index one, datetime with ISO 8601 format.

Because I have a varying number of dfs I need to merge each time I searched for a flexible method but didn't find oney yet.

Is there an easy method to define a list with the different dfs and just call

dfs = [df1, df2, df3, df4]
pd.merge(dfs, how="inner")

without having to chain for each df so that I can keep the number flexible?

Upvotes: 0

Views: 100

Answers (2)

sammywemmy
sammywemmy

Reputation: 28709

You can set the index to include dayofyear and weekday, as I assume from the data you shared that date is already an index. With the index set, You can merge multiple dataframes with a join :

#create example data df1 as you shared in your question : 
#same format for df2 and df3
data = """date_index  clicks_US  dayofyear  weekday
2020-03-15  15000      75         Sunday
2020-03-16  12000      76         Monday
2020-03-17  10000      77         Tuesday"""

df1 = pd.read_csv(StringIO(data), sep='\s{2,}', engine='python', index_col=0)
df1

#columns to append to current index
cols = ['dayofyear','weekday']

#set indexes
first, *others = [frame.set_index(cols,append=True)
                  for frame in (df1,df2,df3)]

#join dataframes
outcome = first.join(others).reset_index()

outcome


    date_index  dayofyear   weekday clicks_US   clicks_UK   clicks_NZ
0   2020-03-15      75      Sunday  15000       13000        7000
1   2020-03-16      76      Monday  12000       9000         5000
2   2020-03-17      77      Tuesday 10000       8000         1000

This can be extended to any number of dataframes

Upvotes: 0

NYC Coder
NYC Coder

Reputation: 7604

You can do it in 2 steps:

dfs = [df1, df2, df3]
df = pd.concat(dfs, axis=1)
df = df.loc[:,~df.columns.duplicated()]
print(df)

            clicks_US  dayofyear  weekday  clicks_UK  clicks_NZ
Date
2020-03-15      15000         75   Sunday      13000       7000
2020-03-16      12000         76   Monday       9000       5000
2020-03-17      10000         77  Tuesday       8000       1000

Upvotes: 1

Related Questions