user288609
user288609

Reputation: 13015

create a week-day/weekend time series dataframe based on a daily time series dataframe

I have created a dataframe with time series information, for instance

Time      daily-bill
2012-01-01   200
2012-01-02  300
2012-01-03   100
2012-01-04    500
….

I would like to create another time series dataframe based on the above time series. How to do it in Pandas?

Time(weekday-and-weekend)                       total-bill
Monday-Friday
Weekend
Monday-Friday
Weekend
Monday-Friday
Weekend

In other words, the time step will be a continuous sequence of weekday and weekend. The weekday consists of Monday to Friday; while the weekend consists of Saturday and Sunday. The total-bill column will store the sum of bills happened in the corresponding days, these information comes from the existing time series.

Upvotes: 1

Views: 457

Answers (1)

jezrael
jezrael

Reputation: 862561

Use:

print (df)
        Time  daily-bill
0 2012-01-01         200
1 2012-01-02         300
2 2012-01-03         100
3 2012-01-04         500
4 2012-01-05         200
5 2012-01-06         300
6 2012-01-07         100
7 2012-01-08         500
8 2012-01-09         500

arr = np.where(df['Time'].dt.weekday > 4, 'Weekend','Monday-Friday')

s = pd.Series(arr)
s1 = s.ne(s.shift()).cumsum()

df = (df['daily-bill'].groupby([s1,s.rename('Time')])
                     .sum()
                     .reset_index(level=0, drop=True)
                     .reset_index())
print (df)
            Time  daily-bill
0        Weekend         200
1  Monday-Friday        1400
2        Weekend         600
3  Monday-Friday         500

Explanation:

  1. First create Series by weekday and numpy.where.
  2. Then create another Series which is create by cumsum of shifted s by shift for distinguish consecutive values
  3. Aggregate sum and remove first level by reset_index with drop=True

Details:

print (s)
0          Weekend
1    Monday-Friday
2    Monday-Friday
3    Monday-Friday
4    Monday-Friday
5    Monday-Friday
6          Weekend
7          Weekend
8    Monday-Friday
dtype: object

print (s1)
0    1
1    2
2    2
3    2
4    2
5    2
6    3
7    3
8    4
dtype: int32

EDIT:

If firts column of input DataFrame is DatetimeIndex:

print (df)
            daily-bill
Time                  
2012-01-01         200
2012-01-02         300
2012-01-03         100
2012-01-04         500
2012-01-05         200
2012-01-06         300
2012-01-07         100
2012-01-08         500
2012-01-09         500

arr = np.where(df.index.weekday > 4, 'Weekend','Monday-Friday')

s = pd.Series(arr, index=df.index)
s1 = s.ne(s.shift()).cumsum()

df = (df['daily-bill'].groupby([s1,s.rename('Time')])
                     .sum()
                     .reset_index(level=0, drop=True)
                     .reset_index())
print (df)
            Time  daily-bill
0        Weekend         200
1  Monday-Friday        1400
2        Weekend         600
3  Monday-Friday         500

Upvotes: 1

Related Questions