Reputation: 13015
I have created a dataframe with time series information, for instance
Time daily-bill
2012-01-01 200
2012-01-02 300
2012-01-03 100
2012-01-04 500
….
I would like to create another time series dataframe based on the above time series. How to do it in Pandas?
Time(weekday-and-weekend) total-bill
Monday-Friday
Weekend
Monday-Friday
Weekend
Monday-Friday
Weekend
In other words, the time step will be a continuous sequence of weekday
and weekend
. The weekday
consists of Monday to Friday
; while the weekend
consists of Saturday
and Sunday
. The total-bill
column will store the sum of bills happened in the corresponding days, these information comes from the existing time series.
Upvotes: 1
Views: 457
Reputation: 862561
Use:
print (df)
Time daily-bill
0 2012-01-01 200
1 2012-01-02 300
2 2012-01-03 100
3 2012-01-04 500
4 2012-01-05 200
5 2012-01-06 300
6 2012-01-07 100
7 2012-01-08 500
8 2012-01-09 500
arr = np.where(df['Time'].dt.weekday > 4, 'Weekend','Monday-Friday')
s = pd.Series(arr)
s1 = s.ne(s.shift()).cumsum()
df = (df['daily-bill'].groupby([s1,s.rename('Time')])
.sum()
.reset_index(level=0, drop=True)
.reset_index())
print (df)
Time daily-bill
0 Weekend 200
1 Monday-Friday 1400
2 Weekend 600
3 Monday-Friday 500
Explanation:
Series
by weekday
and numpy.where
.Series
which is create by cumsum
of shifted s
by shift
for distinguish consecutive valuessum
and remove first level by reset_index
with drop=True
Details:
print (s)
0 Weekend
1 Monday-Friday
2 Monday-Friday
3 Monday-Friday
4 Monday-Friday
5 Monday-Friday
6 Weekend
7 Weekend
8 Monday-Friday
dtype: object
print (s1)
0 1
1 2
2 2
3 2
4 2
5 2
6 3
7 3
8 4
dtype: int32
EDIT:
If firts column of input DataFrame
is DatetimeIndex
:
print (df)
daily-bill
Time
2012-01-01 200
2012-01-02 300
2012-01-03 100
2012-01-04 500
2012-01-05 200
2012-01-06 300
2012-01-07 100
2012-01-08 500
2012-01-09 500
arr = np.where(df.index.weekday > 4, 'Weekend','Monday-Friday')
s = pd.Series(arr, index=df.index)
s1 = s.ne(s.shift()).cumsum()
df = (df['daily-bill'].groupby([s1,s.rename('Time')])
.sum()
.reset_index(level=0, drop=True)
.reset_index())
print (df)
Time daily-bill
0 Weekend 200
1 Monday-Friday 1400
2 Weekend 600
3 Monday-Friday 500
Upvotes: 1