Reputation: 111
df1 is reference and df2 is target. df2's TYPE column value should be ignored just keep value no change.. Because of TYPE column in df2, I couldn't divide directly. How can I compare two dataframes and get ratio from reference dataframe. I have to keep df2 dataframe and just get ratio value about 'sum' updated..
python
import numpy as np
import pandas as pd
df_data = {}
df_data['ID'] = [100001,100002,100003,100004]
df_data['ID2'] = ['A','B','C','D']
df_data['sum'] = [7,8,4,5]
df = pd.DataFrame(df_data)
print(df)
df_data2= {}
df_data2['ID'] = [100001,100002,100002,100003,100003,100001,100002]
df_data2['ID2'] = ['G','H','Q','J','H','A','B']
df_data2['TYPE'] = ['A','A','B','A','B','C','E']
df_data2['sum'] = [14,4,4,2,8,100,10]
df2 = pd.DataFrame(df_data2)
print(df2)
# my trying. I can get value but df2's dataframe is broken. I can't find value for TYPE column..
df.set_index(['ID','ID2'])['sum'] / df.set_index(['ID','ID2'])['sum']
#printout df
ID ID2 sum
0 100001 A 7
1 100002 B 8
2 100003 C 4
3 100004 D 5
#print df2
ID ID2 TYPE sum
0 100001 G A 14
1 100002 H A 4
2 100002 Q B 4
3 100003 J A 2
4 100003 H B 8
5 100001 A C 100
6 100002 B E 10
# my goal
ID ID2 TYPE sum
0 100001 G A N/A # There is no value ( ID:100001 ID2:G)
1 100002 H A N/A # There is no value ( ID:100002 ID2:H)
2 100002 Q B N/A # There is no value ( ID:100002 ID2:Q)
3 100003 J A N/A
4 100003 H B N/A
5 100001 A C 25.0 # There is value ( ID:100001 ID2:A)
6 100002 B E 2.0 # There is value ( ID:100002 ID2:B)
#my trying
ID ID2
100001 A 14.285714
G NaN
100002 B 1.250000
H NaN
Q NaN
100003 C NaN
H NaN
J NaN
100004 D NaN
Upvotes: 3
Views: 990
Reputation: 150775
This can be merge
:
df2['sum'] = (df2.merge(df, on=['ID','ID2'],
how='left')
.assign(sum=lambda x: x.sum_x/x.sum_y)
['sum']
)
Output:
ID ID2 TYPE sum
0 100001 G A NaN
1 100002 H A NaN
2 100002 Q B NaN
3 100003 J A NaN
4 100003 H B NaN
5 100001 A C 14.285714
6 100002 B E 1.250000
Upvotes: 1