ManOnTheMoon
ManOnTheMoon

Reputation: 597

extracting values from dataframe1 using conditions set in dataframe2 (pandas, python)

I have two dateframe (df1 & df2), i'm trying to figure out how to use conditions from df2 to extract values from df1 and use the extracted values in df2.

df1 = values to exact from

df2 = conditions for exaction and df where the extracted values are used

conditions: df2.HJ = df1HJ & df2.JK = df1 P colum

example if df2(df2.HJ = 99 & df2.JK = P3); Ans = 67 (from df1)

df1

╔════╦════╦══════╦══════╦══════╦══════╗
║ HJ ║ P1 ║  P2  ║  P3  ║  P4  ║  P5  ║
╠════╬════╬══════╬══════╬══════╬══════╣
║  5 ║ 51 ║  33  ║  21  ║  31  ║  13  ║
║ 11 ║ 66 ║  45  ║  21  ║  49  ║  58  ║
║ 21 ║  7 ║  55  ║  56  ║  67  ║  73  ║
║ 99 ║  0 ║  76  ║  67  ║  98  ║  29  ║
║ 15 ║ 11 ║  42  ║  79  ║  27  ║  54  ║
╚════╩════╩══════╩══════╩══════╩══════╝

df2

╔════╦════╗
║ HJ ║ JK ║
╠════╬════╣
║ 99 ║ P1 ║
║ 11 ║ P5 ║
║  5 ║ P3 ║
║ 21 ║ P2 ║
║ 11 ║ P3 ║
╚════╩════╝

expected result for df2 after exaction from df1

╔════╦════╦═══════╗
║ HJ ║ JK ║  Ans  ║
╠════╬════╬═══════╣
║ 99 ║ P1 ║    0  ║
║ 11 ║ P5 ║   58  ║
║  5 ║ P3 ║   21  ║
║ 21 ║ P2 ║   55  ║
║ 11 ║ P3 ║   21  ║
╚════╩════╩═══════╝

code for df1

import pandas as pd
import numpy as np
data = {'HJ':[5,11,21,99,15],
'P1':[51,66,7,0,11]
,'P2':[ 33,45,55 ,76 ,42]
,'P3':[ 21 ,21 ,56 ,67 ,79]
,'P4':[ 31 ,49 ,67 ,98 ,27]
,'P5':[ 13 ,58 ,73 ,29 ,54]}
df1 = pd.DataFrame(data)

code for df2

data = {'HJ':[99,11,5,21,11],
'JK':['P1','P5','P3','P2','P3']}
df2 = pd.DataFrame(data)

Regards Thank you

===========

Update

@Scott Boston's solution works:

df2['ans'] = df1.set_index('HJ').lookup(df2['HJ'], df2['JK']) 

However, a KeyError: 'One or more row labels was not found' appears when there is/are labels not found. Is there any way to overcome this problem?

Upvotes: 5

Views: 250

Answers (4)

Scott Boston
Scott Boston

Reputation: 153460

Use pd.DataFrame.lookup after set_index:

df2['ans'] = df1.set_index('HJ').lookup(df2['HJ'], df2['JK'])
print(df2)

Output:

   HJ  JK  ans
0  99  P1    0
1  11  P5   58
2   5  P3   21
3  21  P2   55
4  11  P3   21

Using lookup, you have to filter your inputs to lookup first:

df2m = df2[df2['HJ'].isin(df1['HJ']) & df2['JK'].isin(df1.columns)].copy()

df2m['ans'] = df1.set_index('HJ').lookup(df2m['HJ'],df2m['JK'])

df2.update(df2m)

df2m.combine_first(df2)

Upvotes: 7

Ch3steR
Ch3steR

Reputation: 20669

You can use pd.Index.get_indexer here.

df1.set_index('HJ',inplace=True)
row = df1.index.get_indexer(df2['HJ'])
col = df1.columns.get_indexer(df2['JK'])

df2['Ans'] = df1.to_numpy()[row,col] #df1.values[row,col]
df2
   HJ  JK  Ans
0  99  P1    0
1  11  P5   58
2   5  P3   21
3  21  P2   55
4  11  P3   21

Upvotes: 0

Shubham Sharma
Shubham Sharma

Reputation: 71689

An alternate solution using pd.melt and pd.merge:

d1 = pd.melt(df1, id_vars=['HJ'], var_name='JK', value_name='Ans')
df = pd.merge(df2, d1, on=['HJ', 'JK'], how='left')

# print(df)

   HJ  JK  Ans
0  99  P1    0
1  11  P5   58
2   5  P3   21
3  21  P2   55
4  11  P3   21

Upvotes: 2

Sahith Kurapati
Sahith Kurapati

Reputation: 1715

This one line is enough to solve your problem: (Trust me it works! I've tried it)

df2['Ans'] = [ df1[df2['JK'][i]][list(df1['HJ']).index(df2['HJ'][i])] for i in range(len(df2['JK']))]

Hope it helps :)

Upvotes: 0

Related Questions