Reputation: 113
I am trying to use LOWESS to smooth the following data:
I would like to obtain a smooth line that filters out the spikes in the data. My code is as follows:
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.dates import HourLocator, DayLocator, DateFormatter
from statsmodels.nonparametric.smoothers_lowess import lowess
file = r'C:...'
df = pd.read_csv(file) # reads data file
df['Date'] = pd.to_datetime(df['Time Local'], format='%d/%m/%Y %H:%M')
x = df['Date']
y1 = df['CTk2 Level']
filtered = lowess(y1, x, is_sorted=True, frac=0.025, it=0)
plt.plot(x, y1, 'r')
plt.plot(filtered[:,0], filtered[:,1], 'b')
plt.show()
When I run this code, I get the following error:
ValueError: view limit minimum -7.641460199922635e+16 is less than 1 and is an invalid Matplotlib date value. This often happens if you pass a non-datetime value to an axis that has datetime units
The date in my data is in the format 07/05/2018 00:07:00. I think the issue is that the LOWESS is struggling to work with the datetime data, but not sure?
Can you please help me?
Upvotes: 10
Views: 8631
Reputation: 19810
Lowess doesn't respect the DateTimeIndex type and instead just returns the dates as nanoseconds since epoch. Luckily it is easy to convert back:
smoothedx, smoothedy = lowess(y1, x, is_sorted=True, frac=0.025, it=0)
smoothedx = smoothedx.astype('datetime64[s]')
Upvotes: 6