Resample time series with duplicate values

Question

I am trying to resample a time series that contains duplicate values. I want to resample the time series to include a time point at every 0.1 seconds. For the new time points I'm hoping to insert NaN values into these created rows and keep the existing rows as is.

import pandas as pd
import numpy as np

d1 = ({   
    'Value' : ['A','A',np.nan,np.nan,'B','B','B'],
    'Other' : ['X','X',np.nan,np.nan,'X','X',np.nan],  
    'Col' : [np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],                          
    'Time' : ['2019-08-02 09:50:10.1','2019-08-02 09:50:10.2','2019-08-02 09:50:10.4','2019-08-02 09:50:10.7','2019-08-02 09:50:10.7','2019-08-02 09:50:10.7','2019-08-02 09:50:10.8'],
    'Count' : [1,1,np.nan,5,6,7,8],
    })

df1 = pd.DataFrame(data = d1)

df1['Time'] = pd.to_datetime(df1['Time'])

df1 = (df1.set_index(['Time', df1.groupby('Time').cumcount()])
        .unstack()
        .asfreq('0.1S', method ='pad')
        .stack()
        .reset_index(level=1, drop=True)
        .sort_index()
        .reset_index())

Output:

                     Time Value Other  Col  Count
0 2019-08-02 09:50:10.100     A     X  NaN    1.0
1 2019-08-02 09:50:10.200     A     X  NaN    1.0
2 2019-08-02 09:50:10.300     A     X  NaN    1.0
3 2019-08-02 09:50:10.700   NaN   NaN  NaN    5.0
4 2019-08-02 09:50:10.700     B     X  NaN    6.0
5 2019-08-02 09:50:10.700     B     X  NaN    7.0
6 2019-08-02 09:50:10.800     B   NaN  NaN    8.0

Intended Output:

                     Time Value Other    Col  Count
0 2019-08-02 09:50:10.100     A     X    NaN    1.0
1 2019-08-02 09:50:10.200     A     X    NaN    1.0
2 2019-08-02 09:50:10.300     NaN   NaN  NaN    NaN
3 2019-08-02 09:50:10.400     NaN   NaN  NaN    NaN
4 2019-08-02 09:50:10.500     NaN   NaN  NaN    NaN
5 2019-08-02 09:50:10.600     NaN   NaN  NaN    NaN
6 2019-08-02 09:50:10.700     NaN   NaN  NaN    5.0
7 2019-08-02 09:50:10.700     B     X    NaN    6.0
8 2019-08-02 09:50:10.700     B     X    NaN    7.0
9 2019-08-02 09:50:10.800     B     NaN  NaN    8.0

U13-Forward · Accepted Answer

Try using:

df1 = (df1.set_index(['Time', df1.groupby('Time').cumcount()])
        .unstack()
        .asfreq('100ms', method ='pad')
        .stack()
        .reset_index(level=1, drop=True)
        .sort_index()
        .reset_index())
dr = pd.date_range(df1['Time'].iloc[0], df1['Time'].iloc[-1], freq='100ms')
df2 = pd.DataFrame({'Time': dr[~dr.isin(df1['Time'])]}, columns = df1.columns)
print(pd.concat([df1,df2]).sort_values('Time').reset_index(drop=True))

Output:

                     Time  Col  Count Other Value
0 2019-08-02 09:50:10.100  NaN    1.0     X     A
1 2019-08-02 09:50:10.200  NaN    1.0     X     A
2 2019-08-02 09:50:10.300  NaN    1.0     X     A
3 2019-08-02 09:50:10.400  NaN    NaN   NaN   NaN
4 2019-08-02 09:50:10.500  NaN    NaN   NaN   NaN
5 2019-08-02 09:50:10.600  NaN    NaN   NaN   NaN
6 2019-08-02 09:50:10.700  NaN    5.0   NaN   NaN
7 2019-08-02 09:50:10.700  NaN    6.0     X     B
8 2019-08-02 09:50:10.700  NaN    7.0     X     B
9 2019-08-02 09:50:10.800  NaN    8.0   NaN     B

As you see, I added the last three lines of code ^, I simply create a new data frame df2, which conditions the datetimes not in df1, and assign the rest of the columns to NaN, at the end, I concatenate the two data frames and sort it by the datetime then reset the index and there you go.

Resample time series with duplicate values

Answers (2)

Related Questions