Reputation: 147
I've got the following pandas.DataFrame
and would like to calculate a new column containing the timedelta between consecutive timestamps in the multi-index level Timestamp
:
import pandas as pd
import numpy as np
data = {'Timestamp': [12, 12, 12, 22, 22, 22, 44, 44, 66, 102],
'Customer': ['bmw', 'vw', 'vw', 'bmw', 'vw', 'vw', 'vw', 'vw', 'bmw', 'bmw'],
'Series': ['series1', 'series1', 'series2', 'series1', 'series1', 'series2', 'series1', 'series2', 'series2', 'series1'],
'time_delta': [np.nan, np.nan, np.nan, 10, 10, 10, 22, 22, 22, 36]
}
df = pd.DataFrame(data).set_index(['Timestamp', 'Customer', 'Series'])
The column time_delta
is the desired output I would like to achieve. I somewhat struggle since I can not use the pandas.Series.diff()
function as the periods
are not consistent. I want to do the timestamp delta calculation on the Timestamp
level of the dataframe, but pass the result to all rows of this level. So for the first Timestamp
level value 12 there is no preceeding timestamp value, thus all rows for this timestamp are filled with np.nan
. For the next timestamp 22, I can take the delta to 12 (which is 10) and fill it for all rows of timestamp 22.
Upvotes: 2
Views: 141
Reputation: 150745
Let's try extracting the level values and calculate the difference from there:
df['time_delta'] = df.index.get_level_values('Timestamp')
s = df['time_delta'].diff()
df['time_delta'] = s.where(s>0).ffill()
Output:
time_delta
Timestamp Customer Series
12 bmw series1 NaN
vw series1 NaN
series2 NaN
22 bmw series1 10.0
vw series1 10.0
series2 10.0
44 vw series1 22.0
series2 22.0
66 bmw series2 22.0
102 bmw series1 36.0
Upvotes: 2