Jake
Jake

Reputation: 2912

Merge a list of pandas dataframes

I have a list of data frames and I need to merge them together using a unique column (date). Field names are different so concat is out.

I can manually use df[0].merge(df[1],on='Date').merge(df[3],on='Date) etc. to merge each df one by one, but the issue is that the number of data frames in the list differs with user input.

Is there any way to merge that just combines all data frames in a list in one go? Or perhaps some for loop that does that?

Upvotes: 59

Views: 85417

Answers (2)

cottontail
cottontail

Reputation: 23131

pd.DataFrame.merge() can be called inside reduce as well.

from functools import reduce
df = reduce(lambda x, y: x.merge(y, on='Date'), dfList)

The equivalent explicit loop is readable:

df = dfList[0]
for d in dfList[1:]:
    df = df.merge(d, on='Date')

pd.concat and join can be used for this task as well. Just need to make the merge key the index of each dataframe.

df = pd.concat([d.set_index('Date') for d in dfList], axis=1, join='inner').reset_index()

# or
df = pd.DataFrame().join([d.set_index('Date') for d in dfList], how='outer').dropna().reset_index()

# or 
dfList = [d.set_index('Date') for d in dfList]
df = dfList[0].join(dfList[1:], how='inner').reset_index()

For the toy example,

df1 = pd.DataFrame({'Date': [1,2,3,4], 'Value1': [2,3,3,4]})
df2 = pd.DataFrame({'Date': [1,4,2], 'Value2': [2,3,3]})
df3 = pd.DataFrame({'Date': [3,2,4,1,6], 'Value3': [1,2,3,3,4]})
dfList = [df1, df2, df3]

all of the options above produce:

   Date  Value1  Value2  Value3
0     1       2       2       3
1     2       3       3       2
2     4       4       3       3

Upvotes: 5

akuiper
akuiper

Reputation: 214957

You can use reduce function where dfList is your list of data frames:

import pandas as pd
from functools import reduce
reduce(lambda x, y: pd.merge(x, y, on = 'Date'), dfList)

As a demo:

df = pd.DataFrame({'Date': [1,2,3,4], 'Value': [2,3,3,4]})
dfList = [df, df, df]
dfList

# [   Date  Value
#  0     1      2
#  1     2      3
#  2     3      3
#  3     4      4,    Date  Value
#  0     1      2
#  1     2      3
#  2     3      3
#  3     4      4,    Date  Value
#  0     1      2
#  1     2      3
#  2     3      3
#  3     4      4]

reduce(lambda x, y: pd.merge(x, y, on = 'Date'), dfList)
#   Date  Value_x  Value_y  Value
# 0    1        2        2      2
# 1    2        3        3      3
# 2    3        3        3      3
# 3    4        4        4      4

Upvotes: 130

Related Questions