Roman Lents
Roman Lents

Reputation: 131

Iterate through two dataframes and filter based on conditions in columns

I have two DataFrames like that below:

df_1 = pd.DataFrame({
    'wag': [111, 111, 222], 
    'date': ['2019-08-30', '2019-09-20', '2019-08-10']
})

and

df_2 = pd.DataFrame({
    'wag': [111, 111, 111, 222, 222, 333, 333, 333], 
    'msr_date': ['2019-08-30', '2019-08-30', '2019-08-30', 
                 '2019-08-30', '2019-08-30', '2019-08-30', 
                 '2019-08-30', '2019-08-30'], 
    'measurements': [1, 2, 3, 4, 5, 6, 7, 8]
})

df_1 and df_2 are below:

wag date
111 2019-08-30
111 2019-09-20
222 2019-08-10
wag msr_date measurements
111 2019-03-29 1
111 2019-03-19 2
111 2019-03-10 3
222 2019-09-30 4
222 2019-09-20 5
333 2019-08-30 6
333 2019-08-30 7
333 2019-09-20 8

So, I have two DataFrames df_1 and df_2 and I want to iterate through them at the same time and filter df_2. I need to iterate based on wag in both tables.

If wag in df_1 is in df_2 then I have to compare date and mrs_date and

For example, here I first check the first row of df_1 and if mrs_date <= date in df_2 then I append all rows to df_new. If not then I go to the next date for that wag. If we have one if not then check another wag. So finally, for this example I will get df_new as below:

wag msr_date measurements
111 2019-03-29 1
111 2019-03-19 2
111 2019-03-10 3

In reality, there are so many values in df_1 and in df_2 so for_loop should be quite complicated and I couldn't find the way how to do it. If it is not clear how I described the task please ask.

Upvotes: 0

Views: 1731

Answers (2)

Zen
Zen

Reputation: 145

Using a for loop:

new_df = pd.DataFrame(data=None, columns=df_2.columns)

for idx, row in df_1.iterrows():
  for idx2, row2 in df_2.iterrows():
    if row['wag'] == row2['wag'] and row2['msr_date'] <= row['date']:
      new_df = new_df.append(row2).drop_duplicates()

Upvotes: 0

Corralien
Corralien

Reputation: 120429

Merge your 2 dataframes on wag column then keep only rows that match your condition msr_date <= date. Finally, drop the column date from df_1 and remove duplicates rows.

>>> df_2.merge(df_1, on='wag', how='left') \
        .query('msr_date <= date') \
        .drop(columns='date').drop_duplicates()

   wag    msr_date  measurements
0  111  2019-03-29             1
2  111  2019-03-19             2
4  111  2019-03-10             3

Upvotes: 2

Related Questions