Jon
Jon

Reputation: 25

Combine Three DataFrames Using Pandas in Python

I am trying to combine three pandas DataFrames in python. Below are the three DataFrames that I am trying to combine as well as my desired output (where NaN is null). I know that joining the two tables with a left merge did not work. What is the correct sequence of two merges to achieve the desired output? (Does not have to be one line of code if it is not possible) Thanks!

df1
    +--------+
    | x      |
    +--------+
    | 1      |
    | 2      |
    | 3      |
    +--------+

df2
    +--------+---+
    | x      | b |
    +--------+---+
    | 1      | A |
    | 1      | B |
    | 1      | C |
    | 2      | D |
    | 2      | E |
    | 2      | F |
    | 3      | G |
    +--------+---+

df3
    +--------+---+
    | x      | c |
    +--------+---+
    | 1      | L |
    | 1      | M |
    | 2      | N |
    | 3      | O |
    | 3      | P |
    | 3      | Q |
    +--------+---+

df_result
    +----------------+-----+-----+
    | x              |  b  |  c  |
    +----------------+-----+-----+
    | 1              | A   | NaN |
    | 1              | B   | NaN |
    | 1              | C   | NaN |
    | 1              | NaN | L   |
    | 1              | NaN | M   |
    | 2              | D   | NaN |
    | 2              | E   | NaN |
    | 2              | F   | NaN |
    | 2              | NaN | N   |
    | 3              | G   | NaN |
    | 3              | NaN | O   |
    | 3              | NaN | P   |
    | 3              | NaN | Q   |
    +----------------+-----+-----+

The following attempts do not result in the the df_result DataFrame as shown above:

attempt1:
df_step1 = df1.merge(df2, on='x', how='left')
df_result = df_step1.merge(df3, on='x', how='left')
df_result

I have tried the above with a varying combination of left, right, outer and inner joins / merges

attempt2:
df_result = pd.concat([table1, table2, table3], axis=1, sort='false')
df_result

This also does not produce the desired df_result DataFrame.

Perhaps I need a combination of a concat and merge? Or, because it is a new row for each entry basically, I could just write a for loop that enters the information for each of these entries in a new series in the df. Something like this:

for i in range(len(df1.index)):
        for j in range (len(df2.index)):
            df_result = df_result.append(df2[j])

        for k in range (len(df3.index)):
            df_result = df_result.append(df3[k])

Upvotes: 2

Views: 982

Answers (2)

James Dellinger
James Dellinger

Reputation: 1261

I found that concatenating the second and third dataframes, and then after that, sorting by the x column allowed me to produce a df that matches the expected output defined above in df_result:

df1 = pd.DataFrame({'x': [1,2,3]})
df2 = pd.DataFrame({'x': [1,1,1,2,2,2,3],
                    'b': ['A', 'B', 'C', 'D', 'E', 'F', 'G']})
df3 = pd.DataFrame({'x': [1,1,2,3,3,3],
                    'c': ['L', 'M', 'N', 'O', 'P', 'Q']})


pd.concat([df2, df3], sort=False).sort_values('x').set_index('x', drop=True)

    b   c
x       
1   A   NaN
1   B   NaN
1   C   NaN
1   NaN L
1   NaN M
2   D   NaN
2   E   NaN
2   F   NaN
2   NaN N
3   G   NaN
3   NaN O
3   NaN P
3   NaN Q

Upvotes: 1

Alex Yu
Alex Yu

Reputation: 3537

Is it what you need?

import pandas as pd
df2 = pd.DataFrame(data=[(1, 'A'),
 (1, 'B'),
 (1, 'C'),
 (2, 'D'),
 (2, 'E'),
 (3, 'F'),
 (3, 'G')], columns = ("x","b"))

df3 = pd.DataFrame(data=[(1, 'L'),
 (1, 'M'),
 (2, 'N'),
 (3, 'L'),
 (3, 'O'),
 (3, 'P'),
 (3, 'Q')], columns= ("x","c"))

df2["c"] = float('nan')
df3["b"] = float('nan')

df_result=pd.concat((df2,df3), sort=True)[["x","b","c"]]
df_result.sort_values("x")

I got:

   x     b     c
0  1    A  NaN
1  1    B  NaN
2  1    C  NaN
0  1  NaN    L
1  1  NaN    M
3  2    D  NaN
4  2    E  NaN
2  2  NaN    N
5  3    F  NaN
6  3    G  NaN
3  3  NaN    L
4  3  NaN    O
5  3  NaN    P
6  3  NaN    Q

Upvotes: 0

Related Questions