Bob Hopez
Bob Hopez

Reputation: 783

Merging two or more columns which don't overlap

Follow up to this post: Merging two columns which don't overlap and create new columns

import pandas as pd

df1 = pd.DataFrame([["2014", "q2", 2],
                   ["2013", "q1", 1],],
                   columns=('Year', 'Quarter', 'Value'))

df2 = pd.DataFrame([["2016", "q1", 3], 
                    ["2015", "q1", 3]],
                   columns=('Year', 'Quarter', 'Value'))


print(df1.merge(df2, on='Year', how='outer'))

Results in:

  Year   Quarter_x  Value_x  Quarter_y  Value_y
0  2014        q2        2       NaN     NaN
1  2013        q1        1       NaN     NaN
2  2016       NaN      NaN        q1     3
3  2015       NaN      NaN        q1     3

But I want to get this:

   Year    Quarter   Value 
0  2014        q2        2      
1  2013        q1        1     
2  2016        q1        3
3  2015        q1        3

Note: This doesn't produce the desired result... :(

print(df1.merge(df2, on=['Year', 'Quarter','Value'], how='outer').dropna())

   Year Quarter  Value
0  2014      q2      2
1  2013      q1      1

... using 'left' or right' or inner also don't cut it.

Upvotes: 1

Views: 233

Answers (3)

piRSquared
piRSquared

Reputation: 294278

pandas concat is much better suited for this.

pd.concat([df1, df2]).reset_index(drop=True)

   Year Quarter  Value
0  2014      q2      2
1  2013      q1      1
2  2016      q1      3
3  2015      q1      3

concat is intended to place one dataframe adjacent to another while keeping the index or columns aligned. In the default case, it keeps the columns aligned. Considering your example dataframes, the columns are aligned and your stated expected output shows df2 placed exactly after df1 where the columns are aligned. Every aspect of what you've asked for is exactly what concat was designed to provide. All I've done is point you to an appropriate function.

Upvotes: 2

Moe Chughtai
Moe Chughtai

Reputation: 384

You're looking for the append feature:

df_final = df1.append(df2)

Upvotes: 1

Stefan
Stefan

Reputation: 42875

Not sure what's happening here, but if I do

df1.merge(df2, on=['Year', 'Quarter', 'Value'], how='outer').dropna()

I get:

   Year Quarter  Value
0  2014      q2    2.0
1  2013      q1    1.0
2  2016      q1    3.0
3  2015      q1    3.0

You may want to take a look at the merge, join & concat docs.

The most 'intuitive' way for this is probably .append():

df1.append(df2)

   Year Quarter  Value
0  2014      q2    2.0
1  2013      q1    1.0
2  2016      q1    3.0
3  2015      q1    3.0

If you look into the source code, you'll find it calls concat behind the scenes.

Merge is useful and intended for cases where you have columns with overlapping values.

Upvotes: 2

Related Questions