user_unknown
user_unknown

Reputation: 33

Merging 2 pandas DataFrames with certain conditions

I can't figure out how to merge 2 pandas DataFrames with certain conditions.

I have this:

import pandas as pd

df1 = pd.DataFrame({'Name': ['A', 'A', 'A', 'B' ,'B' ,'B'],
                    'Date': ['10/15/2019', '10/16/2019', '10/17/2019', '10/15/2019', '10/16/2019', '10/17/2019'], 
                    'Value 1': [101, 102, 103, 201, 202, 203],
                    'Value 2': ['A1', 'A2', 'A3', 'B1', 'B2', 'B3']})

df2 = pd.DataFrame({'Name': ['A', 'A', 'B', 'B' ,'C'],
                    'Date': ['10/14/2019', '10/15/2019', '10/13/2019', '10/18/2019', '10/18/2019'], 
                    'Value 3': [2, 2, 22, 44, 222]})

I need to get to this

I need merge only names that exist in df1, but add dates that exist in df2. However, those dates should be greater than 10/13/2019.

The closest I could get is

df_m = pd.merge(df1, df2, how='outer', left_on=['Name', 'Date'], right_on=['Name', 'Date'], sort=True)

But this includes everything ('C' from df2 - I don't need it and date 10/13/2019 for 'B' - I don't need it either).

I'd really appreciate any help.

Edited: We can drop the condition of dates being greater than 10/13/2019 - I figured this out by just created another df2_mod where I didn't include dates I don't need. However, some dates that exist in df2_mod are not in df1 and I need to merge them with values 3. However I don't need Names from df2 into df1 that don't exist in df1

Upvotes: 0

Views: 73

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195593

Here is one solution (I don't filter for dates being greater than 10/13/2019, as stated in the updated question):

  1. Fist we filter df2 and drop all 'Names' that don't exist in df1
  2. Do outer join with df1 and df2_filtered on Name and Date columns
  3. Sort the values and recreate index

import pandas as pd

df1 = pd.DataFrame({'Name': ['A', 'A', 'A', 'B' ,'B' ,'B'],
                    'Date': ['10/15/2019', '10/16/2019', '10/17/2019', '10/15/2019', '10/16/2019', '10/17/2019'],
                    'Value 1': [101, 102, 103, 201, 202, 203],
                    'Value 2': ['A1', 'A2', 'A3', 'B1', 'B2', 'B3']})

df2 = pd.DataFrame({'Name': ['A', 'A', 'B', 'B' ,'C'],
                    'Date': ['10/14/2019', '10/15/2019', '10/13/2019', '10/18/2019', '10/18/2019'],
                    'Value 3': [2, 2, 22, 44, 222]})

df2_filtered = df2[df2['Name'].isin(df1['Name'])]       # we want df2 rows with `Names` that exists in df1
print(df1.merge(df2_filtered, on=['Name', 'Date'], how='outer').sort_values(['Name', 'Date']).reset_index(drop=True))

Prints:

  Name        Date  Value 1 Value 2  Value 3
0    A  10/14/2019      NaN     NaN      2.0
1    A  10/15/2019    101.0      A1      2.0
2    A  10/16/2019    102.0      A2      NaN
3    A  10/17/2019    103.0      A3      NaN
4    B  10/13/2019      NaN     NaN     22.0
5    B  10/15/2019    201.0      B1      NaN
6    B  10/16/2019    202.0      B2      NaN
7    B  10/17/2019    203.0      B3      NaN
8    B  10/18/2019      NaN     NaN     44.0

Upvotes: 1

Related Questions