Nisha Pandey
Nisha Pandey

Reputation: 109

Divide columns in df by another df value based on condition

I have a dataframe:

df = pd.DataFrame({'date': ['2013-04-01','2013-04-01','2013-04-01','2013-04-02', '2013-04-02'],
           'month': ['1','1','3','3','5'],
          'pmonth': ['1', '1', '2', '5', '5'],
          'duration': [30, 15, 20, 15, 30],
         'pduration': ['10', '20', '30', '40', '50']})

I have to divide duration and pduration by value column of second dataframe where date and month of two df match. The second df is:

 df = pd.DataFrame({'date': ['2013-04-01','2013-04-02','2013-04-03','2013-04-04', '2013-04-05'],
           'month': ['1','1','3','3','5'],
          'value': ['1', '1', '2', '5', '5'],
          })

The second df is grouped by date and month, so duplicate combination of date month won't be present in the second df.

Upvotes: 3

Views: 2503

Answers (3)

jezrael
jezrael

Reputation: 862591

First is necessary check if same dtypes of column date and month in both DataFrames and if numeric for columns for divide:

#convert to numeric
df1['pduration'] = df1['pduration'].astype(int)
df2['value'] = df2['value'].astype(int)

print (df1.dtypes)
date         object
month        object
pmonth       object
duration      int64
pduration     int32

print (df2.dtypes)
date     object
month    object
value     int32
dtype: object

Then merge with left join and divide by DataFrame.div

df = df1.merge(df2, on=['date', 'month'], how='left')

df[['duration_new','pduration_new']] = df[['duration','pduration']].div(df['value'], axis=0)
print (df)
         date month pmonth  duration  pduration  value  duration_new  \
0  2013-04-01     1      1        30         10    1.0          30.0   
1  2013-04-01     1      1        15         20    1.0          15.0   
2  2013-04-01     3      2        20         30    NaN           NaN   
3  2013-04-02     3      5        15         40    NaN           NaN   
4  2013-04-02     5      5        30         50    NaN           NaN   

   pduration_new  
0           10.0  
1           20.0  
2            NaN  
3            NaN  
4            NaN  

For remove value column use pop:

df[['duration_new','pduration_new']] = (df[['duration','pduration']]
                                             .div(df.pop('value'), axis=0))
print (df)
         date month pmonth  duration  pduration  duration_new  pduration_new
0  2013-04-01     1      1        30         10          30.0           10.0
1  2013-04-01     1      1        15         20          15.0           20.0
2  2013-04-01     3      2        20         30           NaN            NaN
3  2013-04-02     3      5        15         40           NaN            NaN
4  2013-04-02     5      5        30         50           NaN            NaN

Upvotes: 3

plasmon360
plasmon360

Reputation: 4199

you can merge the two dataframes, where the date and month match the value column will be added to the first data frame. If there is no match it will represented by NaN. You can then do division operation. see code below

Assuming your second dataframe is df2, then

df3 = df2.merge(df, how = 'right')
for col in ['duration','pduration']:
    df3['new_'+col] = df3[col].astype(float)/df3['value'].astype(float)
df3

results in

date    month   value   pmonth  duration    pduration   newduration newpduration
0   2013-04-01  1   1   1   30  10  30.0    10.0
1   2013-04-01  1   1   1   15  20  15.0    20.0
2   2013-04-01  3   NaN 2   20  30  NaN NaN
3   2013-04-02  3   NaN 5   15  40  NaN NaN
4   2013-04-02  5   NaN 5   30  50  NaN NaN

Upvotes: 0

Mohit Musaddi
Mohit Musaddi

Reputation: 143

You can merge the second df into the first df and then divide.

Consider the first df as df1 and second df as df2

df1 = df1.merge(df2, on=['date', 'month'], how='left').fillna(1)
df1
         date month pmonth  duration pduration value
0  2013-04-01     1      1        30        10     1
1  2013-04-01     1      1        15        20     1
2  2013-04-01     3      2        20        30     1
3  2013-04-02     3      5        15        40     1
4  2013-04-02     5      5        30        50     1

df1['duration'] = df1['duration'] / df1['value']
df1['pduration'] = df1['pduration'] / df1['value']
df1.drop('value', axis=1, inplace=True)

Upvotes: 0

Related Questions