Demetri Pananos
Demetri Pananos

Reputation: 7404

Accessing a dictionary with a pandas date range

I have a dataframe with a bunch of dates

0   2016-01-01
1   2016-01-02
2   2016-01-03
3   2016-01-04
4   2016-01-05
5   2016-01-06
6   2016-01-07
7   2016-01-08
8   2016-01-09
9   2016-01-10
10  2016-01-11
11  2016-01-12
12  2016-01-13
13  2016-01-14
14  2016-01-15
15  2016-01-16
16  2016-01-17
17  2016-01-20
18  2016-01-21
19  2016-01-22
20  2016-01-24
21  2016-01-25
22  2016-01-27
23  2016-01-28
24  2016-01-29
25  2016-01-30
26  2016-01-31

I'd like to group the dataframe by the date using r = df.group_by('time'), and then loop through the keys to get some stats. The thing is that the days are not complete (you'll see I'm missing the 18 and 19 of January). So what I would really like to do is create a date range, and then loop through the date range. But when I try that, I get a key error when I pass the elements of the date range into the dictionary.

Any ideas on how I could do this?

Here is some code:

doi = (df.time<='2016-01-31')&(df.time>='2016-01-01')
oil = df[doi]


#Trouble Here.
r = oil.groupby(by = 'time')
D = oil.time
dates = pd.date_range(D.min(),D.max())
frames = []

for d in dates:
#The idea here is that if the date in the date range is not in the dataframe,
#Then there is no sum to compute.  return 0
    try:
        sum_of_oil = oil.ix[r.groups[d]].capacity.sum()
    except KeyError:
        sum_of_oil = 0
    frames.append([d,sum_of_oil])

frames = pd.DataFrame(frames, columns = ['time','volume'])

Might be worth noting that the elements of oil.time are Timestamps.

Upvotes: 1

Views: 874

Answers (2)

Parfait
Parfait

Reputation: 107642

Consider merging with a full complete set of month days:

import datetime
import pandas as pd

startdate = datetime.datetime.strptime('2015-01-01', '%Y-%m-%d')
jandates = [startdate + datetime.timedelta(days=i) for i in range(31)]

datesdf = pd.DataFrame({'date':jandates})    
mergedf = pd.merge(datesdf, actualdf, on='date', how='left').fillna(0)

Upvotes: 1

su79eu7k
su79eu7k

Reputation: 7316

Even incomplete timeseries, You can resample.

         date    qty
0  2015-01-01    123
1  2015-01-02    213
2  2015-01-03  41234
3  2015-01-04  12342
4  2015-01-05     32
5  2015-01-06      3
6  2015-01-07     24
7  2015-01-08  23423
8  2015-01-09      4
9  2015-01-10    234
10 2015-01-12    234
11 2015-01-13    324
12 2015-01-17    123
13 2015-01-18      5
14 2015-01-19   3454
15 2015-01-20    574
16 2015-01-21     51
17 2015-01-22     56

Try

print df.set_index('date').resample('D').fillna(0).reset_index()

which yields,

         date    qty
0  2015-01-01    123
1  2015-01-02    213
2  2015-01-03  41234
3  2015-01-04  12342
4  2015-01-05     32
5  2015-01-06      3
6  2015-01-07     24
7  2015-01-08  23423
8  2015-01-09      4
9  2015-01-10    234
10 2015-01-11      0
11 2015-01-12    234
12 2015-01-13    324
13 2015-01-14      0
14 2015-01-15      0
15 2015-01-16      0
16 2015-01-17    123
17 2015-01-18      5
18 2015-01-19   3454
19 2015-01-20    574
20 2015-01-21     51
21 2015-01-22     56

Upvotes: 2

Related Questions