John
John

Reputation: 43289

Sum of days in intervals

I have a pandas data frame as follows:

ID      Start        End
1  2013-03-03 2013-04-05
2  2013-03-03 2013-04-05
3  2012-03-02 2012-05-05
4  2013-03-01 2013-04-04
5  2013-02-25 2013-06-05
6  2012-02-26 2012-05-05
7  2013-02-25 2013-04-27
8  2012-02-26 2012-03-01
9  2013-03-22 2013-03-25
10 2013-06-27 2013-07-01

Is there a way to aggregate the number of days that occured per month:Year for each interval for each ID.

The output I am trying to get to is the sum of all the days per month:Year that each ID contributes.

So, ID1 will contrinute 29 days to March 2013 and 5 days to April 2013; ID3 will contribute 30days to March 2012, 30 days to April 2012, and 5 days to May 2012. Etc....

Thanks in advance if you can help.

Upvotes: 3

Views: 217

Answers (2)

Andy Hayden
Andy Hayden

Reputation: 375745

Here's an awful way to do it:

In [11]: df1.apply(lambda x: pd.Series(1, pd.date_range(x.loc['Start'], x.loc['End'] - pd.offsets.Day(1), freq='D')).resample('M', how='sum'), axis=1).fillna(0)
Out[11]: 
    2012-02-29  2012-03-31  2012-04-30  2012-05-31  2013-02-28  2013-03-31  2013-04-30  2013-05-31  2013-06-30
ID                                                                                                            
1            0           0           0           0           0          29           4           0           0
2            0           0           0           0           0          29           4           0           0
3            0          30          30           4           0           0           0           0           0
4            0           0           0           0           0          31           3           0           0
5            0           0           0           0           4          31          30          31           4
6            4          31          30           4           0           0           0           0           0
7            0           0           0           0           4          31          26           0           0
8            4           0           0           0           0           0           0           0           0
9            0           0           0           0           0           3           0           0           0
10           0           0           0           0           0           0           0           0           4

To break it down, for each row we are calculating the days in each month by creating a Series of 1s for every day between the start and end, then summing them up for each month using resample:

In [12]: x = df1.iloc[0]

In [13]: x
Out[13]: 
Start   2013-03-03 00:00:00
End     2013-04-05 00:00:00
Name: 1, dtype: datetime64[ns]


In [14]: pd.Series(1, pd.date_range(x['Start'], x['End'] - pd.offsets.Day(1), freq='D')).resample('M', how='sum')
Out[14]: 
2013-03-31    29
2013-04-30     4
Freq: M, dtype: int64

Upvotes: 4

Raymond Hettinger
Raymond Hettinger

Reputation: 226604

The datetime module has the tools you need to do month by month manipulation of dates.

Here's some quick stand-alone code that can easily be adapted to a Pandas dataframe:

import datetime
import collections
import pprint

data = '''\
1  2013-03-03 2013-04-05
2  2013-03-03 2013-04-05
3  2012-03-02 2012-05-05
4  2013-03-01 2013-04-04
5  2013-02-25 2013-06-05
6  2012-02-26 2012-05-05
7  2013-02-25 2013-04-27
8  2012-02-26 2012-03-01
9  2013-03-22 2013-03-25
10 2013-06-27 2013-07-01
'''

for line in data.splitlines():
    idnum, startstr, endstr = line.split()
    start = datetime.datetime.strptime(startstr, '%Y-%m-%d')
    end = datetime.datetime.strptime(endstr, '%Y-%m-%d')

    days_in_month = collections.Counter()
    current = start
    while current <= end:
        days_in_month[current.year, current.month] += 1
        current += datetime.timedelta(1)
    print('Id: %s' % idnum)
    pprint.pprint(dict(days_in_month))
    print('-' * 10)

Upvotes: 1

Related Questions