crt
crt

Reputation: 61

Finding first row in dataframe that matches condition

I have two dataframes. My first dataframe has two columns, which I would like to use to create a condition and return a value from the second dataframe.

df1 = (['a', 'a', 'b', 'c'], [0.4, 0.9, 0.1, 0.6])

df2 = (['a', 'a', 'b', 'b', 'c', 'c'], [0.2, 0.6, 0.3, 0.8, 0.1, 0.4], 
              [500, 200, 900, 400, 250, 800])

For each row in df1, I would like to look up df2 by the two columns of data. The first condition is to match the letter in df1 with df2. The second condition is to lookup the first number row in df2 where the number is greater than df1. If no number is greater, take the first row in df2 that matches the letter.

My target output is:

Apologies for any poor formatting with my question, please provide any advice regarding posting (its my first one).

Thanks very much!

Upvotes: 4

Views: 6438

Answers (2)

jezrael
jezrael

Reputation: 862511

I think better is avoid iteration.
I merged data from df1 and df2 together by column a. Output has twice more rows as original df1. Then condition is applied by np.where and output is cast to integer. Series s contained output, each odd value. (because merging with one rows from df1 with two rows from df2)

#   a    b
#0  a  0.4
#1  a  0.9
#2  b  0.1
#3  c  0.6

#   a    c    d
#0  a  0.2  500
#1  a  0.6  200
#2  b  0.3  900
#3  b  0.8  400
#4  c  0.1  250
#5  c  0.4  800  

#merged data - intersection df1 and df2 by column a
df = pd.merge(df1, df2, on=['a'], how='inner')
#apply condition
s = np.where(df['c']< df['b'], df['d'], df['d'].shift(1)).astype('int')
#odd values
s = s[1::2]
#[500 200 900 800]

#if need add data do df1 as column e
#df1['e'] = pd.Series(s, index=df1.index)

iteration solution (Delforge):

100 loops, best of 3: 4.67 ms per loop

merged solution (me):

100 loops, best of 3: 1.93 ms per loop

Upvotes: 4

Delforge
Delforge

Reputation: 792

You may iterate df1 and make a selection base on column 0 letter and greater than comparison between the two columns 1. Assuming that your column 1 values are ascending, if the selected dataframe is not empty store the last value of the column 2. If empty get the first value of df2 col 2 that match column 0 selection.

Here is an example, I store the results in a dictionnary.

Example:

Your data

import pandas as pd

df1 = pd.DataFrame(data = [['a', 'a', 'b', 'c'], [0.4, 0.9, 0.1, 0.6]]).transpose()

df2 = pd.DataFrame(data = [['a', 'a', 'b', 'b', 'c', 'c'], [0.2, 0.6, 0.3, 0.8, 0.1, 0.4], [500, 200, 900, 400, 250, 800]]).transpose()

The iteration:

results = {} # dictionnary to store results

for i, row in df1.iterrows():
    select = df2[(df2[0] == row[0]) & (df2[1] < row[1])] # selection

    if not select.empty:
        results[i] = select[2].iloc[-1] # storing last value of column 2
    else:
        results[i] = df2[df2[0] == row[0]][2].iloc[0] # storing first value if no greater than match

print results # {0: 500, 1: 200, 2: 900, 3: 800}

Upvotes: 1

Related Questions