How to manipulate index in dataframe

Question

I have a dataframe like this:

rx
10:06:33    928000
10:06:34    400000
10:06:36    632000
10:06:46    712000
10:06:48    736000
10:06:49    856000
10:06:51    736000
10:06:58    728000
10:07:01    760000
Name: size, dtype: int64

and this is the type of my index.

Index([u'10:06:33', u'10:06:34', u'10:06:36', u'10:06:46', u'10:06:48',
       u'10:06:49', u'10:06:51', u'10:06:58', u'10:07:01'],
      dtype='object', name=u'rx')

I want to manipulate my index to start from 0 but also if you can see I have time difference. So 10:06:33 should be 0 and 10:06:34 should be 1 and 10:06:36 should be 3. As you can see there are seconds "missing" in between and thats why I cannot simply use df= df.reset_index(drop=True). Also my data in the other column should have value zero where a second is missing. so my dataframe should look something like this:

rx
0    928000
1    400000
2    0
3    632000
4    0
5    0
6    0
7    0
8    0
9    0
10   0
11   0
12   0
13   712000
14   0
15   736000
16   856000
17   0
18   736000
19   0
20   0 
21   0
22   0
23   0
24   0
25   728000
26   0
29   0
28   760000
Name: size, dtype: int64

Basically I want to plot my df but with time starting from 0 and for every second from 10:06:33 to 10:07:01. Please help! Thanks!

jezrael · Accepted Answer

First convert times to strings and convert to timedeltas by to_timedelta and converting to series, get seconds by Series.dt.total_seconds, use Series.diff with Series.fillna and cumulative sum and last use DataFrame.reindex:

df.index = (pd.to_timedelta(df.index.to_series().astype(str))
              .dt.total_seconds()
              .diff()
              .fillna(0)
              .cumsum())

df = df.reindex(range(int(df.index.max()+1)), fill_value=0)

How to manipulate index in dataframe

Answers (1)

Related Questions