Reputation: 321
I have a dataframe like this:
rx
10:06:33 928000
10:06:34 400000
10:06:36 632000
10:06:46 712000
10:06:48 736000
10:06:49 856000
10:06:51 736000
10:06:58 728000
10:07:01 760000
Name: size, dtype: int64
and this is the type of my index.
Index([u'10:06:33', u'10:06:34', u'10:06:36', u'10:06:46', u'10:06:48',
u'10:06:49', u'10:06:51', u'10:06:58', u'10:07:01'],
dtype='object', name=u'rx')
I want to manipulate my index to start from 0 but also if you can see I have time difference. So 10:06:33 should be 0 and 10:06:34 should be 1 and 10:06:36 should be 3. As you can see there are seconds "missing" in between and thats why I cannot simply use df= df.reset_index(drop=True)
. Also my data in the other column should have value zero where a second is missing. so my dataframe should look something like this:
rx
0 928000
1 400000
2 0
3 632000
4 0
5 0
6 0
7 0
8 0
9 0
10 0
11 0
12 0
13 712000
14 0
15 736000
16 856000
17 0
18 736000
19 0
20 0
21 0
22 0
23 0
24 0
25 728000
26 0
29 0
28 760000
Name: size, dtype: int64
Basically I want to plot my df but with time starting from 0 and for every second from 10:06:33 to 10:07:01. Please help! Thanks!
Upvotes: 1
Views: 257
Reputation: 862641
First convert times to strings and convert to timedeltas by to_timedelta
and converting to series, get seconds by Series.dt.total_seconds
, use Series.diff
with Series.fillna
and cumulative sum and last use DataFrame.reindex
:
df.index = (pd.to_timedelta(df.index.to_series().astype(str))
.dt.total_seconds()
.diff()
.fillna(0)
.cumsum())
df = df.reindex(range(int(df.index.max()+1)), fill_value=0)
print (df)
size
rx
0 928000
1 400000
2 0
3 632000
4 0
5 0
6 0
7 0
8 0
9 0
10 0
11 0
12 0
13 712000
14 0
15 736000
16 856000
17 0
18 736000
19 0
20 0
21 0
22 0
23 0
24 0
25 728000
26 0
27 0
28 760000
Upvotes: 2