How can I make a dataframe with duplicate datetime index entries unique?

Question

I have a times series of trading data where the trade time stamp is only granular down to the second. Some trades occur within the same second but I can't get millisecond time stamps. I don't want to get rid of the duplicates - rather I would like to make the index more granular by assuming that multiple trades within a second happen 1 millisecond apart (there is max 10 or 20 trades within one second so won't go over the 1000 millisecond per second max). For example, I have the following

8:31:58.000 AM trade1 
8:31:58.000 AM trade2 
8:31:58.000 AM trade3 
8:31:58.000 AM trade4

and would like to generate

8:31:58.000 AM trade1 
8:31:58.001 AM trade2 
8:31:58.002 AM trade3 
8:31:58.003 AM trade4

Alternatively I am open to other ways to do this - the idea is to preserve the data at the same time as having a unique index. Thoughts?

cs95 · Accepted Answer

Assuming this is your input:

df

           trade
08:31:58  trade1
08:31:58  trade2
08:31:58  trade3
08:31:58  trade4

Convert the index to datetime. Then, group by index, get the cumulative count and convert that to timedelta.

Finally, add the results and assign.

dt = pd.to_datetime(df.index, format='%H:%M:%S')
delta = pd.to_timedelta(df.groupby(dt).cumcount(), unit='ms')
df.index = (dt + delta.values).time

df

                  trade
08:31:58         trade1
08:31:58.001000  trade2
08:31:58.002000  trade3
08:31:58.003000  trade4

If you just want a MultiIndex instead, you can use

df.set_index(df.groupby(dt).cumcount(), append=True)

             trade
08:31:58 0  trade1
         1  trade2
         2  trade3
         3  trade4

Which does not modify the initial level, just adds a new one.

How can I make a dataframe with duplicate datetime index entries unique?

Answers (1)

Related Questions