Reputation: 31
I am new to python and working with dataframes. I have two dataframes, one with data for months and another with data for days in those months. I want the data from the monthly dataframe as a column in the daily dataframe, repeated for the number of days in that month. Thanks, I have tried to Provide with an illustration below.
Monthly DF
Val Date Year Month
0 0.00 2016-01-31 2016 1
1 0.10 2016-02-29 2016 2
2 0.07 2016-03-31 2016 3
3 0.01 2016-04-30 2016 4
4 0.28 2016-05-31 2016 5
DailyDF
Date Year Month Val
0 2016-01-01 2016 1 0
1 2016-01-02 2016 1 0
2 2016-01-03 2016 1 0
3 2016-01-04 2016 1 0
4 2016-01-05 2016 1 0
5 2016-01-06 2016 1 0
6 2016-01-07 2016 1 0
7 2016-01-08 2016 1 0
8 2016-01-09 2016 1 0
9 2016-01-10 2016 1 0
10 2016-01-11 2016 1 0
11 2016-01-12 2016 1 0
12 2016-01-13 2016 1 0
13 2016-01-14 2016 1 0
14 2016-01-15 2016 1 0
15 2016-01-16 2016 1 0
16 2016-01-17 2016 1 0
17 2016-01-18 2016 1 0
18 2016-01-19 2016 1 0
19 2016-01-20 2016 1 0
20 2016-01-21 2016 1 0
21 2016-01-22 2016 1 0
22 2016-01-23 2016 1 0
23 2016-01-24 2016 1 0
24 2016-01-25 2016 1 0
25 2016-01-26 2016 1 0
26 2016-01-27 2016 1 0
27 2016-01-28 2016 1 0
28 2016-01-29 2016 1 0
29 2016-01-30 2016 1 0
.. ... ... ... ...
31 2016-02-01 2016 2 0
32 2016-02-02 2016 2 0
33 2016-02-03 2016 2 0
34 2016-02-04 2016 2 0
35 2016-02-05 2016 2 0
36 2016-02-06 2016 2 0
37 2016-02-07 2016 2 0
38 2016-02-08 2016 2 0
39 2016-02-09 2016 2 0
40 2016-02-10 2016 2 0
41 2016-02-11 2016 2 0
42 2016-02-12 2016 2 0
43 2016-02-13 2016 2 0
44 2016-02-14 2016 2 0
45 2016-02-15 2016 2 0
46 2016-02-16 2016 2 0
47 2016-02-17 2016 2 0
48 2016-02-18 2016 2 0
49 2016-02-19 2016 2 0
50 2016-02-20 2016 2 0
51 2016-02-21 2016 2 0
52 2016-02-22 2016 2 0
53 2016-02-23 2016 2 0
54 2016-02-24 2016 2 0
55 2016-02-25 2016 2 0
56 2016-02-26 2016 2 0
57 2016-02-27 2016 2 0
58 2016-02-28 2016 2 0
59 2016-02-29 2016 2 0
60 2016-03-01 2016 3 0
So in the 'Val' column of Daily Dataframe I want the "Val" from the Monthly Dataframe to be repeated for the number of days in that month.
Expected Output
Date Year Month Val
0 2016-01-01 2016 1 0
1 2016-01-02 2016 1 0
2 2016-01-03 2016 1 0
3 2016-01-04 2016 1 0
4 2016-01-05 2016 1 0
5 2016-01-06 2016 1 0
6 2016-01-07 2016 1 0
7 2016-01-08 2016 1 0
8 2016-01-09 2016 1 0
.. ... ... ... ...
10 2016-01-11 2016 1 0
11 2016-01-12 2016 1 0
12 2016-01-13 2016 1 0
13 2016-01-14 2016 1 0
14 2016-01-15 2016 1 0
15 2016-01-16 2016 1 0
16 2016-01-17 2016 1 0
17 2016-01-18 2016 1 0
18 2016-01-19 2016 1 0
19 2016-01-20 2016 1 0
20 2016-01-21 2016 1 0
21 2016-01-22 2016 1 0
22 2016-01-23 2016 1 0
23 2016-01-24 2016 1 0
24 2016-01-25 2016 1 0
25 2016-01-26 2016 1 0
26 2016-01-27 2016 1 0
27 2016-01-28 2016 1 0
28 2016-01-29 2016 1 0
29 2016-01-30 2016 1 0
.. ... ... ... ...
41 2016-02-11 2016 2 0.10
42 2016-02-12 2016 2 0.10
43 2016-02-13 2016 2 0.10
44 2016-02-14 2016 2 0.10
45 2016-02-15 2016 2 0.10
46 2016-02-16 2016 2 0.10
47 2016-02-17 2016 2 0.10
.. ... ... ... ...
49 2016-03-19 2016 3 0.07
50 2016-03-20 2016 3 0.07
51 2016-03-21 2016 3 0.07
52 2016-03-22 2016 3 0.07
53 2016-03-23 2016 3 0.07
54 2016-03-24 2016 3 0.07
Upvotes: 0
Views: 3735
Reputation: 210882
As @Merlin has already mentioned joining (using pd.merge() method) should be pretty straightforward:
In [126]: pd.merge(daily.drop('Val', 1), monthly.drop('Date', 1), on=['Year','Month'])
Out[126]:
Date Year Month val Val
0 2016-01-01 2016 1 0 0.00
1 2016-01-02 2016 1 0 0.00
2 2016-01-03 2016 1 0 0.00
3 2016-01-04 2016 1 0 0.00
4 2016-01-05 2016 1 0 0.00
5 2016-01-06 2016 1 0 0.00
6 2016-01-07 2016 1 0 0.00
7 2016-01-08 2016 1 0 0.00
8 2016-01-09 2016 1 0 0.00
9 2016-01-10 2016 1 0 0.00
10 2016-01-11 2016 1 0 0.00
11 2016-01-12 2016 1 0 0.00
12 2016-01-13 2016 1 0 0.00
13 2016-01-14 2016 1 0 0.00
14 2016-01-15 2016 1 0 0.00
.. ... ... ... ... ...
137 2016-05-17 2016 5 0 0.28
138 2016-05-18 2016 5 0 0.28
139 2016-05-19 2016 5 0 0.28
140 2016-05-20 2016 5 0 0.28
141 2016-05-21 2016 5 0 0.28
142 2016-05-22 2016 5 0 0.28
143 2016-05-23 2016 5 0 0.28
144 2016-05-24 2016 5 0 0.28
145 2016-05-25 2016 5 0 0.28
146 2016-05-26 2016 5 0 0.28
147 2016-05-27 2016 5 0 0.28
148 2016-05-28 2016 5 0 0.28
149 2016-05-29 2016 5 0 0.28
150 2016-05-30 2016 5 0 0.28
151 2016-05-31 2016 5 0 0.28
[152 rows x 5 columns]
I want to offer you a bit more challenging task - generate your desired DF just from the MonthlyDF:
In [108]: df
Out[108]:
Val Date Year Month
0 0.00 2016-01-31 2016 1
1 0.10 2016-02-28 2016 2
2 0.07 2016-03-31 2016 3
3 0.01 2016-04-30 2016 4
4 0.28 2016-05-31 2016 5
In [117]: df.set_index('Date').resample('MS').mean().append(x.iloc[[-1]]).resample('D').pad().reset_index()
Out[117]:
Date Val Year Month
0 2016-01-01 0.00 2016 1
1 2016-01-02 0.00 2016 1
2 2016-01-03 0.00 2016 1
3 2016-01-04 0.00 2016 1
4 2016-01-05 0.00 2016 1
5 2016-01-06 0.00 2016 1
6 2016-01-07 0.00 2016 1
7 2016-01-08 0.00 2016 1
8 2016-01-09 0.00 2016 1
9 2016-01-10 0.00 2016 1
10 2016-01-11 0.00 2016 1
11 2016-01-12 0.00 2016 1
12 2016-01-13 0.00 2016 1
13 2016-01-14 0.00 2016 1
14 2016-01-15 0.00 2016 1
.. ... ... ... ...
137 2016-05-17 0.28 2016 5
138 2016-05-18 0.28 2016 5
139 2016-05-19 0.28 2016 5
140 2016-05-20 0.28 2016 5
141 2016-05-21 0.28 2016 5
142 2016-05-22 0.28 2016 5
143 2016-05-23 0.28 2016 5
144 2016-05-24 0.28 2016 5
145 2016-05-25 0.28 2016 5
146 2016-05-26 0.28 2016 5
147 2016-05-27 0.28 2016 5
148 2016-05-28 0.28 2016 5
149 2016-05-29 0.28 2016 5
150 2016-05-30 0.28 2016 5
151 2016-05-31 0.28 2016 5
[152 rows x 4 columns]
Explanation:
resample MonthlyDF
to the begin-of-month
In [112]: df.set_index('Date').resample('MS').mean()
Out[112]:
Val Year Month
Date
2016-01-01 0.00 2016 1
2016-02-01 0.10 2016 2
2016-03-01 0.07 2016 3
2016-04-01 0.01 2016 4
2016-05-01 0.28 2016 5
add last row from the original MonthlyDF
:
In [113]: df.set_index('Date').resample('MS').mean().append(x.iloc[[-1]])
Out[113]:
Val Year Month
Date
2016-01-01 0.00 2016 1
2016-02-01 0.10 2016 2
2016-03-01 0.07 2016 3
2016-04-01 0.01 2016 4
2016-05-01 0.28 2016 5
2016-05-31 0.28 2016 5
after that we can easily resample it using daily
rule: D
Upvotes: 1