Jake Wagner
Jake Wagner

Reputation: 826

Pandas Merge Result Output Next Row

Suppose I have two dataframes

df_1

city           state           salary
New York        NY             85000
Chicago         IL             65000
Miami           FL             75000
Dallas          TX             78000
Seattle         WA             96000

df_2

city           state           taxes
New York        NY             15000
Chicago         IL             5000
Miami           FL             6500

Next, I join the two dataframes

joined_df = df_1.merge(df_2, how='inner', left_on=['city'], right_on = ['city'])

The Result:

joined_df

city           state           salary           city           state        taxes
New York        NY             85000           New York          NY         15000
Chicago         IL             65000           Chicago           IL         5000
Miami           FL             75000           Miami             FL         6500

Is there anyway I can stack the two dataframes on top of each other joining on the city instead of extending the line horizontally, like below:

Requested:

joined_df

city             state         salary          taxes
New York          NY            85000
New York          NY                           15000
Chicago           IL            65000
Chicago           IL                           5000
Miami             FL            75000
Miami             FL                           6500

How can I do this in Pandas!

Upvotes: 0

Views: 83

Answers (3)

You can use append (a shortcut for concat) to achieve that:

result = df1.append(df2, sort=False)

If your dataframes have overlapping indexes, you can use:

df1.append(df2, ignore_index=True, sort=False)

Also, you can look for more information here

UPDATE: After appending your dataframes, you can filter your result to get only the rows that contains the city in both dataframes:

result = result.loc[result['city'].isin(df1['city'])
       & result['city'].isin(df2['city'])]

Upvotes: 1

not_speshal
not_speshal

Reputation: 23166

Try with stack():

stacked = df_1.merge(df_2, on=["city", "state"]).set_index(["city", "state"]).stack()
output = pd.concat([stacked.where(stacked.index.get_level_values(-1)=="salary"), 
                    stacked.where(stacked.index.get_level_values(-1)=="taxes")], 
                   axis=1,
                   keys=["salary", "taxes"]) \
           .droplevel(-1) \
           .reset_index()

>>> output
       city state   salary    taxes
0  New York    NY  85000.0      NaN
1  New York    NY      NaN  15000.0
2   Chicago    IL  65000.0      NaN
3   Chicago    IL      NaN   5000.0
4     Miami    FL  75000.0      NaN
5     Miami    FL      NaN   6500.0

Upvotes: 0

Kyle Parsons
Kyle Parsons

Reputation: 1525

In this case we might need to use merge to restrict to the relevant rows before concat if we need to consider both city and state.

rel_df_1 = df_1.merge(df_2)[df_1.columns]
rel_df_2 = df_2.merge(df_1)[df_2.columns]
df = pd.concat([rel_df_1, rel_df_2]).sort_values(['city', 'state'])

Upvotes: 1

Related Questions