Time series forecasting with svr in scikit learn

Question

I have data set of daily temperature indexed by date and I need to predict future temperature using [SVR][1] in scikit-learn.

I'm stuck with selecting the X and Y of the training and X of testing set. For example if I want to predict Y at time t then I need the training set to contain the X & Y at t-1, t-2, ..., t-N where N is the number of previous days used to predict Y at t.

How can I do that?

here is it.

df=daily_temp1
# define function for create N lags
def create_lags(df, N):
    for i in range(N):
        df['datetime' + str(i+1)] = df.datetime.shift(i+1)
        df['dewpoint' + str(i+1)] = df.dewpoint.shift(i+1)
        df['humidity' + str(i+1)] = df.humidity.shift(i+1)
        df['pressure' + str(i+1)] = df.pressure.shift(i+1)
        df['temperature' + str(i+1)] = df.temperature.shift(i+1)
    df['vism' + str(i+1)] = df.vism.shift(i+1)
    df['wind_direcd' + str(i+1)] = df.wind_direcd.shift(i+1)
    df['wind_speed' + str(i+1)] = df.wind_speed.shift(i+1)
    df['wind_direct' + str(i+1)] = df.wind_direct.shift(i+1)

    return df

# create 10 lags
df = create_lags(df,10)


# the first 10 days will have missing values. can't use them.
df = df.dropna()

# create X and y
y = df['temperature']
X = df.iloc[:, 9:]

# Train on 70% of the data
train_idx = int(len(df) * .7)

# create train and test data
X_train, y_train, X_test, y_test = X[:train_idx], y[:train_idx], X[train_idx:], y[train_idx:]


# fit and predict
clf = SVR()
clf.fit(X_train, y_train)

clf.predict(X_test)

Ted Petrou · Accepted Answer

Here's a solution that builds the feature matrix X as the simply lag1 - lagN where lag1 is the previous days temperature and lagN is the temperature N days ago.

# create fake temperature
df = pd.DataFrame({'temp':np.random.rand(500)})

# define function for create N lags
def create_lags(df, N):
    for i in range(N):
        df['Lag' + str(i+1)] = df.temp.shift(i+1)
    return df

# create 10 lags
df = create_lags(df,10)

# the first 10 days will have missing values. can't use them.
df = df.dropna()

# create X and y
y = df.temp.values
X = df.iloc[:, 1:].values

# Train on 70% of the data
train_idx = int(len(df) * .7)

# create train and test data
X_train, y_train, X_test, y_test = X[:train_idx], y[:train_idx], X[train_idx:], y[:train_idx]

# fit and predict
clf = SVR()
clf.fit(X_train, y_train)

clf.predict(X_test)

Time series forecasting with svr in scikit learn

Answers (1)

Related Questions