Reputation: 597
I have two dateframe (df1 & df2), i'm trying to figure out how to use conditions from df2 to extract values from df1 and use the extracted values in df2.
df1 = values to exact from
df2 = conditions for exaction and df where the extracted values are used
conditions: df2.HJ = df1HJ & df2.JK = df1 P colum
example if df2(df2.HJ = 99 & df2.JK = P3); Ans = 67 (from df1)
df1
╔════╦════╦══════╦══════╦══════╦══════╗ ║ HJ ║ P1 ║ P2 ║ P3 ║ P4 ║ P5 ║ ╠════╬════╬══════╬══════╬══════╬══════╣ ║ 5 ║ 51 ║ 33 ║ 21 ║ 31 ║ 13 ║ ║ 11 ║ 66 ║ 45 ║ 21 ║ 49 ║ 58 ║ ║ 21 ║ 7 ║ 55 ║ 56 ║ 67 ║ 73 ║ ║ 99 ║ 0 ║ 76 ║ 67 ║ 98 ║ 29 ║ ║ 15 ║ 11 ║ 42 ║ 79 ║ 27 ║ 54 ║ ╚════╩════╩══════╩══════╩══════╩══════╝
df2
╔════╦════╗ ║ HJ ║ JK ║ ╠════╬════╣ ║ 99 ║ P1 ║ ║ 11 ║ P5 ║ ║ 5 ║ P3 ║ ║ 21 ║ P2 ║ ║ 11 ║ P3 ║ ╚════╩════╝
expected result for df2 after exaction from df1
╔════╦════╦═══════╗ ║ HJ ║ JK ║ Ans ║ ╠════╬════╬═══════╣ ║ 99 ║ P1 ║ 0 ║ ║ 11 ║ P5 ║ 58 ║ ║ 5 ║ P3 ║ 21 ║ ║ 21 ║ P2 ║ 55 ║ ║ 11 ║ P3 ║ 21 ║ ╚════╩════╩═══════╝
code for df1
import pandas as pd
import numpy as np
data = {'HJ':[5,11,21,99,15],
'P1':[51,66,7,0,11]
,'P2':[ 33,45,55 ,76 ,42]
,'P3':[ 21 ,21 ,56 ,67 ,79]
,'P4':[ 31 ,49 ,67 ,98 ,27]
,'P5':[ 13 ,58 ,73 ,29 ,54]}
df1 = pd.DataFrame(data)
code for df2
data = {'HJ':[99,11,5,21,11],
'JK':['P1','P5','P3','P2','P3']}
df2 = pd.DataFrame(data)
Regards Thank you
===========
@Scott Boston's solution works:
df2['ans'] = df1.set_index('HJ').lookup(df2['HJ'], df2['JK'])
However, a KeyError: 'One or more row labels was not found' appears when there is/are labels not found. Is there any way to overcome this problem?
Upvotes: 5
Views: 250
Reputation: 153460
Use pd.DataFrame.lookup
after set_index
:
df2['ans'] = df1.set_index('HJ').lookup(df2['HJ'], df2['JK'])
print(df2)
Output:
HJ JK ans
0 99 P1 0
1 11 P5 58
2 5 P3 21
3 21 P2 55
4 11 P3 21
Using lookup, you have to filter your inputs to lookup first:
df2m = df2[df2['HJ'].isin(df1['HJ']) & df2['JK'].isin(df1.columns)].copy()
df2m['ans'] = df1.set_index('HJ').lookup(df2m['HJ'],df2m['JK'])
df2.update(df2m)
df2m.combine_first(df2)
Upvotes: 7
Reputation: 20669
You can use pd.Index.get_indexer
here.
df1.set_index('HJ',inplace=True)
row = df1.index.get_indexer(df2['HJ'])
col = df1.columns.get_indexer(df2['JK'])
df2['Ans'] = df1.to_numpy()[row,col] #df1.values[row,col]
df2
HJ JK Ans
0 99 P1 0
1 11 P5 58
2 5 P3 21
3 21 P2 55
4 11 P3 21
Upvotes: 0
Reputation: 71689
An alternate solution using pd.melt
and pd.merge
:
d1 = pd.melt(df1, id_vars=['HJ'], var_name='JK', value_name='Ans')
df = pd.merge(df2, d1, on=['HJ', 'JK'], how='left')
# print(df)
HJ JK Ans
0 99 P1 0
1 11 P5 58
2 5 P3 21
3 21 P2 55
4 11 P3 21
Upvotes: 2
Reputation: 1715
This one line is enough to solve your problem: (Trust me it works! I've tried it)
df2['Ans'] = [ df1[df2['JK'][i]][list(df1['HJ']).index(df2['HJ'][i])] for i in range(len(df2['JK']))]
Hope it helps :)
Upvotes: 0