dideod98
dideod98

Reputation: 111

pandas how to divide to get ratio for different two dataframe

df1 is reference and df2 is target. df2's TYPE column value should be ignored just keep value no change.. Because of TYPE column in df2, I couldn't divide directly. How can I compare two dataframes and get ratio from reference dataframe. I have to keep df2 dataframe and just get ratio value about 'sum' updated..

python

 import numpy as np
 import pandas as pd
 df_data = {}
 df_data['ID'] = [100001,100002,100003,100004]
 df_data['ID2'] = ['A','B','C','D']
 df_data['sum'] = [7,8,4,5]
 df = pd.DataFrame(df_data)
 print(df)

 df_data2= {}
 df_data2['ID'] = [100001,100002,100002,100003,100003,100001,100002]
 df_data2['ID2'] = ['G','H','Q','J','H','A','B']
 df_data2['TYPE'] = ['A','A','B','A','B','C','E']
 df_data2['sum'] = [14,4,4,2,8,100,10]
 df2 = pd.DataFrame(df_data2)
 print(df2)

 # my trying. I can get value but df2's dataframe is broken. I can't find value for TYPE column..
 df.set_index(['ID','ID2'])['sum'] / df.set_index(['ID','ID2'])['sum']
#printout df
       ID ID2  sum
0  100001   A    7
1  100002   B    8
2  100003   C    4
3  100004   D    5
#print df2
       ID ID2 TYPE  sum
0  100001   G    A   14
1  100002   H    A    4
2  100002   Q    B    4
3  100003   J    A    2
4  100003   H    B    8
5  100001   A    C  100
6  100002   B    E   10

# my goal
       ID ID2 TYPE  sum
0  100001   G    A   N/A  # There is no value ( ID:100001 ID2:G)
1  100002   H    A   N/A  # There is no value ( ID:100002 ID2:H)
2  100002   Q    B   N/A  # There is no value ( ID:100002 ID2:Q)
3  100003   J    A   N/A
4  100003   H    B   N/A
5  100001   A    C   25.0  # There is value ( ID:100001 ID2:A)
6  100002   B    E   2.0   # There is value ( ID:100002 ID2:B)

#my trying
ID      ID2
100001  A      14.285714
        G            NaN
100002  B       1.250000
        H            NaN
        Q            NaN
100003  C            NaN
        H            NaN
        J            NaN
100004  D            NaN

Upvotes: 3

Views: 990

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150775

This can be merge:

df2['sum'] = (df2.merge(df, on=['ID','ID2'],
                        how='left')
                 .assign(sum=lambda x: x.sum_x/x.sum_y)
                 ['sum']
             )

Output:

       ID ID2 TYPE        sum
0  100001   G    A        NaN
1  100002   H    A        NaN
2  100002   Q    B        NaN
3  100003   J    A        NaN
4  100003   H    B        NaN
5  100001   A    C  14.285714
6  100002   B    E   1.250000

Upvotes: 1

Related Questions