Reputation: 43

Merge a list of dataframes by one column with reduce function

i have edited this post with the specific case:

i have a list of dataframes like this (note that df1 and df2 have a row in common)

df1

index	Date	A
0	2010-06-19	4
1	2010-06-20	3
2	2010-06-21	2
3	2010-06-22	1
4	2012-07-19	5

df2

index	Date	B
0	2012-07-19	5
1	2012-07-20	6

df3

index	Date	C
0	2020-06-19	5
1	2020-06-20	2
2	2020-06-21	9

df_list = [df1, df2, df3]

I would like to merge all dataframes in a single dataframe, without losing rows and placing nan where there are no things to merge. The criteria would be merging them by the column 'Date' (the column should have all the dates of all the merged dataframes, ordered by date).

The resulting dataframe should look like this:

Resulting Dataframe:

index	Date	A	B	C
0	2010-06-19	4	nan	nan
1	2010-06-20	3	nan	nan
2	2010-06-21	2	nan	nan
3	2010-06-22	1	nan	nan
4	2012-07-19	5	5	nan
5	2012-07-20	nan	6	nan
6	2020-06-19	nan	nan	5
7	2020-06-20	nan	nan	2
8	2020-06-21	nan	nan	9

I tried something like this:

from functools import reduce

df_merged = reduce(lambda  left,right: pd.merge(left,right,on=['Date'], how='outer'), df_list)

BUT the resulting dataframe is not as expected (i miss some columns and is not ordered by date). I think i am missing something.

Thank you very much

Upvotes: 1

Answers (2)

Legolas DYD Leon Kennedy

Reputation: 43

For overlapping data, I had to add the option: Sort = TRUE in the lambda function. Seemed I was missing the order for big dataframes and I was only seeing the nan at the end and start of frames. Thank you all ;-)

from functools import reduce

df_merged = reduce(lambda left,right: pd.merge(left,right,on=['Date'],
        how='outer', sort=True), df_list)

Upvotes: 1

user17242583

Reputation:

Use pandas.concat(). It takes a list of dataframes, and appends common columns together, filling new columns with NaN as necessary:

new_df = pd.concat([df1, df2, df3])

Output:

>>> new_df
   index        Date    A    B    C
0      0  2010-06-19  4.0  NaN  NaN
1      1  2010-06-20  3.0  NaN  NaN
2      2  2010-06-21  2.0  NaN  NaN
3      3  2010-06-22  1.0  NaN  NaN
0      0  2012-07-19  NaN  5.0  NaN
1      1  2012-07-20  NaN  6.0  NaN
0      0  2020-06-19  NaN  NaN  5.0
1      1  2020-06-20  NaN  NaN  2.0
2      2  2020-06-21  NaN  NaN  9.0

Upvotes: 0

Merge a list of dataframes by one column with reduce function

Answers (2)

Related Questions