Reputation: 61
I have two dataframes. My first dataframe has two columns, which I would like to use to create a condition and return a value from the second dataframe.
df1 = (['a', 'a', 'b', 'c'], [0.4, 0.9, 0.1, 0.6])
df2 = (['a', 'a', 'b', 'b', 'c', 'c'], [0.2, 0.6, 0.3, 0.8, 0.1, 0.4],
[500, 200, 900, 400, 250, 800])
For each row in df1, I would like to look up df2 by the two columns of data. The first condition is to match the letter in df1 with df2. The second condition is to lookup the first number row in df2 where the number is greater than df1. If no number is greater, take the first row in df2 that matches the letter.
My target output is:
Apologies for any poor formatting with my question, please provide any advice regarding posting (its my first one).
Thanks very much!
Upvotes: 4
Views: 6438
Reputation: 862511
I think better is avoid iteration.
I merged data from df1
and df2
together by column a
. Output has twice more rows as original df1
. Then condition is applied by np.where and output is cast to integer. Series s
contained output, each odd value. (because merging with one rows from df1
with two rows from df2
)
# a b
#0 a 0.4
#1 a 0.9
#2 b 0.1
#3 c 0.6
# a c d
#0 a 0.2 500
#1 a 0.6 200
#2 b 0.3 900
#3 b 0.8 400
#4 c 0.1 250
#5 c 0.4 800
#merged data - intersection df1 and df2 by column a
df = pd.merge(df1, df2, on=['a'], how='inner')
#apply condition
s = np.where(df['c']< df['b'], df['d'], df['d'].shift(1)).astype('int')
#odd values
s = s[1::2]
#[500 200 900 800]
#if need add data do df1 as column e
#df1['e'] = pd.Series(s, index=df1.index)
iteration solution (Delforge):
100 loops, best of 3: 4.67 ms per loop
merged solution (me):
100 loops, best of 3: 1.93 ms per loop
Upvotes: 4
Reputation: 792
You may iterate df1 and make a selection base on column 0 letter and greater than comparison between the two columns 1. Assuming that your column 1 values are ascending, if the selected dataframe is not empty store the last value of the column 2. If empty get the first value of df2 col 2 that match column 0 selection.
Here is an example, I store the results in a dictionnary.
Example:
Your data
import pandas as pd
df1 = pd.DataFrame(data = [['a', 'a', 'b', 'c'], [0.4, 0.9, 0.1, 0.6]]).transpose()
df2 = pd.DataFrame(data = [['a', 'a', 'b', 'b', 'c', 'c'], [0.2, 0.6, 0.3, 0.8, 0.1, 0.4], [500, 200, 900, 400, 250, 800]]).transpose()
The iteration:
results = {} # dictionnary to store results
for i, row in df1.iterrows():
select = df2[(df2[0] == row[0]) & (df2[1] < row[1])] # selection
if not select.empty:
results[i] = select[2].iloc[-1] # storing last value of column 2
else:
results[i] = df2[df2[0] == row[0]][2].iloc[0] # storing first value if no greater than match
print results # {0: 500, 1: 200, 2: 900, 3: 800}
Upvotes: 1