The interpolate method in pandas.DataFrame.interpolate() doesn't interpolate or extrapolate time-series data correctly

I'm having problems performing the interpolate method in pandas.DataFrame.interpolate(). I have an example time-series data, each datapoint is about 2 minutes apart. I'm trying to resample the data to exactly 2 minutes apart for later synchronization with other data. The thing is the temperature and humidity values are not interpolated correctly as my understanding. I tried different methods like interpolate(method='time'), interpolate(method='linear') and interpolate(method='index') but they gave similar results. May I ask what did I do or understand incorrectly about this method in pandas?

import pandas as pd
import numpy as np

# Generating random data
np.random.seed(0)
num_rows = 20
data = {
    'temperature': np.random.randint(20, 30, num_rows),
    'humidity': np.random.randint(40, 60, num_rows)
}
print(data)
# Generating random time indices
# Generating random time offsets for each row
time_offsets = np.random.randint(0, 120, num_rows)
time_offsets = pd.to_timedelta(time_offsets, unit='s')

# Generating random start and end times
start_time = pd.Timestamp('2024-02-24 9:55:37')
end_time = pd.Timestamp('2024-02-24 11:00:00')

# Generating time indices for each row
time_indices = [start_time + pd.Timedelta(minutes=2*i) + offset for i, offset in enumerate(time_offsets)]

print(time_indices)
# Creating DataFrame
combined_data = pd.DataFrame(data, index=time_indices)

print("Random DataFrame:")
print(combined_data)


# Resample the data to 2-minute frequency
resampled_data = combined_data.resample('2min').interpolate(method='time')


print("\nResampled DataFrame:")
print(resampled_data)

Below is the results I got. The interpolated dataframe repeats itself for some rows and then outputs different values of averaged data that is not similar at all to my hand calculation.

Random DataFrame:
                     temperature  humidity
2024-02-24 09:56:00           25        45
2024-02-24 09:57:46           20        53
2024-02-24 10:00:34           23        48
2024-02-24 10:02:09           23        49
2024-02-24 10:04:08           27        59
2024-02-24 10:06:51           29        56
2024-02-24 10:09:33           23        59
2024-02-24 10:10:00           25        45
2024-02-24 10:12:12           22        55
2024-02-24 10:14:52           24        55
2024-02-24 10:17:31           27        40
2024-02-24 10:18:32           26        58
2024-02-24 10:20:05           28        43
2024-02-24 10:22:11           28        57
2024-02-24 10:23:37           21        59
2024-02-24 10:25:37           26        59
2024-02-24 10:28:13           27        59
2024-02-24 10:30:30           27        54
2024-02-24 10:31:42           28        47
2024-02-24 10:34:00           21        40

Resampled DataFrame:
                     temperature   humidity
2024-02-24 09:56:00    25.000000  45.000000
2024-02-24 09:58:00    25.000000  45.000000
2024-02-24 10:00:00    25.000000  45.000000
2024-02-24 10:02:00    25.000000  45.000000
2024-02-24 10:04:00    25.000000  45.000000
2024-02-24 10:06:00    25.000000  45.000000
2024-02-24 10:08:00    25.000000  45.000000
2024-02-24 10:10:00    25.000000  45.000000
2024-02-24 10:12:00    24.666667  44.583333
2024-02-24 10:14:00    24.333333  44.166667
2024-02-24 10:16:00    24.000000  43.750000
2024-02-24 10:18:00    23.666667  43.333333
2024-02-24 10:20:00    23.333333  42.916667
2024-02-24 10:22:00    23.000000  42.500000
2024-02-24 10:24:00    22.666667  42.083333
2024-02-24 10:26:00    22.333333  41.666667
2024-02-24 10:28:00    22.000000  41.250000
2024-02-24 10:30:00    21.666667  40.833333
2024-02-24 10:32:00    21.333333  40.416667
2024-02-24 10:34:00    21.000000  40.000000

Thank you so much!

I tried different methods in the interpolate method of pandas. I expect the values of temperature and humidity are interpolated or extrapolated correctly according to their timestamp.

Upvotes: 2

Views: 119

Answers (2)

e-motta
e-motta

Reputation: 7540

# From OP
resampled_data = combined_data.resample('2min').interpolate(method='time')

This will interpolate values that are inside the same period of 2 min.

IIUC, what you want is to resample and get the mean of these values (or the first, last?), then interpolate:

resampled_data = combined_data.resample("2min").mean().interpolate("linear")
Resampled DataFrame:
                     temperature  humidity
2024-02-24 09:56:00        22.50     49.00
2024-02-24 09:58:00        22.75     48.50
2024-02-24 10:00:00        23.00     48.00
2024-02-24 10:02:00        23.00     49.00
2024-02-24 10:04:00        27.00     59.00
2024-02-24 10:06:00        29.00     56.00
2024-02-24 10:08:00        23.00     59.00
2024-02-24 10:10:00        25.00     45.00
2024-02-24 10:12:00        22.00     55.00
2024-02-24 10:14:00        24.00     55.00
2024-02-24 10:16:00        27.00     40.00
2024-02-24 10:18:00        26.00     58.00
2024-02-24 10:20:00        28.00     43.00
2024-02-24 10:22:00        24.50     58.00
2024-02-24 10:24:00        26.00     59.00
2024-02-24 10:26:00        26.50     59.00
2024-02-24 10:28:00        27.00     59.00
2024-02-24 10:30:00        27.50     50.50
2024-02-24 10:32:00        24.25     45.25
2024-02-24 10:34:00        21.00     40.00

Upvotes: 0

I would consider resampling temperatures using the mean, as below:

import numpy as np
import pandas as pd


np.random.seed(0)  
num_rows = 20
data = {
    'temperature': np.random.randint(20, 30, num_rows),
    'humidity': np.random.randint(40, 60, num_rows)
}
time_offsets = np.random.randint(0, 120, num_rows) 
time_offsets = pd.to_timedelta(time_offsets, unit='s')
start_time = pd.Timestamp('2024-02-24 9:55:37')
time_indices = [start_time + pd.Timedelta(minutes=2*i) + offset for i, offset in enumerate(time_offsets)]

combined_data = pd.DataFrame(data, index=time_indices)

resampled_data = combined_data.resample('2min').mean() 
interpolated_data = resampled_data.interpolate(method='time')

combined_data, resampled_data.head(10), interpolated_data.head(10) 

Which gives you

(                     temperature  humidity
 2024-02-24 09:56:42           25        45
 2024-02-24 09:57:46           20        53
 2024-02-24 10:00:34           23        48
 2024-02-24 10:02:09           23        49
 2024-02-24 10:04:08           27        59
 2024-02-24 10:06:51           29        56
 2024-02-24 10:09:33           23        59
 2024-02-24 10:10:00           25        45
 2024-02-24 10:12:12           22        55
 2024-02-24 10:14:52           24        55
 2024-02-24 10:17:31           27        40
 2024-02-24 10:18:32           26        58
 2024-02-24 10:20:05           28        43
 2024-02-24 10:22:11           28        57
 2024-02-24 10:23:37           21        59
 2024-02-24 10:25:37           26        59
 2024-02-24 10:28:13           27        59
 2024-02-24 10:30:30           27        54
 2024-02-24 10:31:42           28        47
 2024-02-24 10:34:15           21        40,
                      temperature  humidity
 2024-02-24 09:56:00         22.5      49.0
 2024-02-24 09:58:00          NaN       NaN
 2024-02-24 10:00:00         23.0      48.0
 2024-02-24 10:02:00         23.0      49.0
 2024-02-24 10:04:00         27.0      59.0
 2024-02-24 10:06:00         29.0      56.0
 2024-02-24 10:08:00         23.0      59.0
 2024-02-24 10:10:00         25.0      45.0
 2024-02-24 10:12:00         22.0      55.0
 2024-02-24 10:14:00         24.0      55.0,
                      temperature  humidity
 2024-02-24 09:56:00        22.50      49.0
 2024-02-24 09:58:00        22.75      48.5
 2024-02-24 10:00:00        23.00      48.0
 2024-02-24 10:02:00        23.00      49.0
 2024-02-24 10:04:00        27.00      59.0
 2024-02-24 10:06:00        29.00      56.0
 2024-02-24 10:08:00        23.00      59.0
 2024-02-24 10:10:00        25.00      45.0
 2024-02-24 10:12:00        22.00      55.0
 2024-02-24 10:14:00        24.00      55.0)

Upvotes: 0

Related Questions