user1584253
user1584253

Reputation: 1005

RandomForest Regressor: Predict and check performance

I am trying predict price for 5 days in future. I followed this tutorial. This tutorial is about predicting categorical variable and is hence using RandomForest Classifier. I am using the same approach as defined in this tutorial but using RandomForest Regressor as I have to predict last price for 5 days in future. I am confused that how do I predict

Here is my code:

import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics.ranking import roc_curve, auc, roc_auc_score

priceTrainData = pd.read_csv('trainPriceData.csv')

#read test data set
priceTestData =  pd.read_csv('testPriceData.csv')
priceTrainData['Type'] = 'Train'
priceTestData['Type'] = 'Test'


    target_col = "last"


    features = ['low', 'high', 'open', 'last', 'annualized_volatility', 'weekly_return', 
                'daily_average_volume_10',# try to use log in 10, 30,
                'daily_average_volume_30', 'market_cap']

priceTrainData['is_train'] = np.random.uniform(0, 1, len(priceTrainData)) <= .75
    Train, Validate = priceTrainData[priceTrainData['is_train']==True], priceTrainData[priceTrainData['is_train']==False]

    x_train = Train[list(features)].values
    y_train = Train[target_col].values
    x_validate = Validate[list(features)].values
    y_validate = Validate[target_col].values
    x_test = priceTestData[list(features)].values



    random.seed(100)

    rf = RandomForestRegressor(n_estimators = 1000)
    rf.fit(x_train, y_train)
    status = rf.predict(x_validate)

My first question is that how do I specify to get 5 values for prediction and second question is that how do I check the performance of RandomForest Regressor? Kindly assist me.

Upvotes: 0

Views: 1320

Answers (1)

Chandan
Chandan

Reputation: 772

Your x_validate is 'pandas.core.series.Series' in nature. So you could execute this: x_validate[0:5]

This will solve your 2nd question by calculating the R square value. rf.score(x_train,y_train)

Upvotes: 1

Related Questions