hdatas
hdatas

Reputation: 1082

Data frame manipulation with Date

I have 2 data frames as follow:

df1:

       id       Grade         Date
1      78        15        2016-05-23
2      99        12        2015-08-01

df2:

                 rate
2015-01-01       1.22
2015-02-01       1.12
   ...
2015-05-01       1.05
2017-01-01       1.33

I would like to multiply the Grade in df1 with rate in df2 that has same month. So for 2016-05-23 it is in month 05 for I would mulyplu it by 1.05.

Any suggestions ? Thank you for help

Upvotes: 1

Views: 60

Answers (3)

andrew_reece
andrew_reece

Reputation: 21264

If you can rely on only having one instance of a month in df2, here's a working solution:

First, generate sample data based on OP's example.
Note that the first date in df2 is changed to demonstrate functionality for two different month numbers.

df1 = pd.DataFrame([[78, 15, "2016-05-23"],
                    [99, 12, "2015-08-01"]], 
                   columns=['id', 'Grade', 'Date'])

df2 = pd.DataFrame([["2015-08-01", 1.22],
                    ["2015-02-01", 1.12],
                    ["2015-05-01", 1.05],
                    ["2017-01-01", 1.33]],
                  columns=['Date','rate'])

Now extract the month numbers from each DataFrame and save as new columns:

df1['month_num'] = pd.to_datetime(df1.Date).dt.month
df2['month_num'] = pd.to_datetime(df2.Date).dt.month

Finally, compute the product of df1.Grade and the associated rate:

df1['Grade_X_rate'] = df1.Grade.multiply(df1.merge(df2[['month_num','rate']], 
                                                   on="month_num", 
                                                   how="left").rate
                                        )

   id  Grade        Date  month_num  Grade_X_rate
0  78     15  2016-05-23          5         15.75
1  99     12  2015-08-01          8         14.64

Upvotes: 0

Andy Hayden
Andy Hayden

Reputation: 375485

If you set the df2 index to a monthly PeriodIndex:

In [11]: df2.index = df2.index.to_period("M")

In [12]: df2
Out[12]:
         rate
2015-01  1.22
2015-02  1.12
2016-05  1.32
2015-08  1.23

Now, you can pull out the rates efficiently with df2.loc:

In [13]: df2.loc[df1.Date.dt.to_period("M")]["rate"]
Out[13]:
2016-05    1.32
2015-08    1.23
Freq: M, Name: rate, dtype: float64

Now, you can multiply:

In [14]: df2.loc[df1.Date.dt.to_period("M")]["rate"].values * df1["Grade"]
Out[14]:
1    19.80
2    14.76
Name: Grade, dtype: float64


In [21]: df1["NormedGrade"] = df2.loc[df1.Date.dt.to_period("M")]["rate"].values * df1["Grade"]

In [22]: df1
Out[22]:
   id  Grade       Date  Normed Grade
1  78     15 2016-05-23         19.80
2  99     12 2015-08-01         14.76

Upvotes: 2

Allen Qin
Allen Qin

Reputation: 19947

Setup

df1
Out[903]: 
   id  Grade       Date
1  78     15 2016-05-23
2  99     12 2015-08-01

df2
Out[904]: 
        Date  rate
0 2015-01-01  1.22
1 2015-02-01  1.12
2 2015-05-01  1.05
3 2017-01-01  1.33

Solution

#extract month from date and put it in a new column
df1['month'] = df1.Date.apply(lambda x: x.month)
df2['month'] = df2.Date.apply(lambda x: x.month)
#merge two dfs to get the rate
df3 = pd.merge(df1,df2,how='left',on='month')
#multiply grade by its rate.
df3['Grade_rate'] = df3.Grade*df3.rate
df3
Out[897]: 
   id  Grade     Date_x  month     Date_y  rate  Grade_rate
0  78     15 2016-05-23      5 2015-05-01  1.05       15.75
1  99     12 2015-08-01      8        NaT   NaN         NaN

Upvotes: 1

Related Questions