Reputation: 77
I have a times series of trading data where the trade time stamp is only granular down to the second. Some trades occur within the same second but I can't get millisecond time stamps. I don't want to get rid of the duplicates - rather I would like to make the index more granular by assuming that multiple trades within a second happen 1 millisecond apart (there is max 10 or 20 trades within one second so won't go over the 1000 millisecond per second max). For example, I have the following
8:31:58.000 AM trade1
8:31:58.000 AM trade2
8:31:58.000 AM trade3
8:31:58.000 AM trade4
and would like to generate
8:31:58.000 AM trade1
8:31:58.001 AM trade2
8:31:58.002 AM trade3
8:31:58.003 AM trade4
Alternatively I am open to other ways to do this - the idea is to preserve the data at the same time as having a unique index. Thoughts?
Upvotes: 3
Views: 3441
Reputation: 402553
Assuming this is your input:
df
trade
08:31:58 trade1
08:31:58 trade2
08:31:58 trade3
08:31:58 trade4
Convert the index to datetime
. Then, group by index, get the cumulative count and convert that to timedelta
.
Finally, add the results and assign.
dt = pd.to_datetime(df.index, format='%H:%M:%S')
delta = pd.to_timedelta(df.groupby(dt).cumcount(), unit='ms')
df.index = (dt + delta.values).time
df
trade
08:31:58 trade1
08:31:58.001000 trade2
08:31:58.002000 trade3
08:31:58.003000 trade4
If you just want a MultiIndex
instead, you can use
df.set_index(df.groupby(dt).cumcount(), append=True)
trade
08:31:58 0 trade1
1 trade2
2 trade3
3 trade4
Which does not modify the initial level, just adds a new one.
Upvotes: 7