Reputation: 191
I have Normalize my data and apply regression analysis to predict yield(y). but my predicted output also gives in normalized (in 0 to 1) I want my predicted answer in my correct data numbers,not in 0 to 1.
Data:
Total_yield(y) Rain(x)
64799.30 720.1
77232.40 382.9
88487.70 1198.2
77338.20 341.4
145602.05 406.4
67680.50 325.8
84536.20 791.8
99854.00 748.6
65939.90 1552.6
61622.80 1357.7
66439.60 344.3
Next,I have normalize data using this code :
from sklearn.preprocessing import Normalizer
import pandas
import numpy
dataframe = pandas.read_csv('/home/desktop/yield.csv')
array = dataframe.values
X = array[:,0:2]
scaler = Normalizer().fit(X)
normalizedX = scaler.transform(X)
print(normalizedX)
Total_yield Rain
0 0.999904 0.013858
1 0.999782 0.020872
2 0.999960 0.008924
3 0.999967 0.008092
4 0.999966 0.008199
5 0.999972 0.007481
6 0.999915 0.013026
7 0.999942 0.010758
8 0.999946 0.010414
9 0.999984 0.005627
10 0.999967 0.008167
Next, I use this normalize value to calculate R-sqaure using following code :
array=normalizedX
data = pandas.DataFrame(array,columns=['Total_yield','Rain'])
import statsmodels.formula.api as smf
lm = smf.ols(formula='Total_yield ~ Rain', data=data).fit()
lm.summary()
Output :
<class 'statsmodels.iolib.summary.Summary'>
"""
OLS Regression Results
==============================================================================
Dep. Variable: Total_yield R-squared: 0.752
Model: OLS Adj. R-squared: 0.752
Method: Least Squares F-statistic: 1066.
Date: Thu, 09 Feb 2017 Prob (F-statistic): 2.16e-108
Time: 14:21:21 Log-Likelihood: 941.53
No. Observations: 353 AIC: -1879.
Df Residuals: 351 BIC: -1871.
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept 1.0116 0.001 948.719 0.000 1.009 1.014
Rain -0.3013 0.009 -32.647 0.000 -0.319 -0.283
==============================================================================
Omnibus: 408.798 Durbin-Watson: 1.741
Prob(Omnibus): 0.000 Jarque-Bera (JB): 40636.533
Skew: -4.955 Prob(JB): 0.00
Kurtosis: 54.620 Cond. No. 10.3
==============================================================================
Now, R-square = 0.75 ,
regression model : y = b0 + b1 *x
Yield = b0 + b1 * Rain
Yield = intercept + coefficient for Rain * Rain
Now when I use my data value for Rain data then it will gives this answer :
Yield = 1.0116 + ( -0.3013 * 720.1(mm)) = -215.95
-215.95yield is wrong,
And when I use normalize value for rain data then predicted yield comes in normalize value in between 0 to 1.
I want predict if rainfall will be 720.1 mm then how many yield will be there?
If anyone help me how to get predicted yield ? I want to compare Predicted yield vs given yield.
Upvotes: 1
Views: 7286
Reputation: 36609
First, you should not use Normalizer in this case. It doesn't normalize across features. It does it along rows. You may not want it.
Use MinMaxScaler or RobustScaler to scale each feature. See the preprocessing docs for more details.
Second, these classes have a inverse_transform()
function which can convert the predicted y value back to original units.
x = np.asarray([720.1,382.9,1198.2,341.4,406.4,325.8,
791.8,748.6,1552.6,1357.7,344.3]).reshape(-1,1)
y = np.asarray([64799.30,77232.40,88487.70,77338.20,145602.05,67680.50,
84536.20,99854.00,65939.90,61622.80,66439.60]).reshape(-1,1)
scalerx = RobustScaler()
x_scaled = scalerx.fit_transform(x)
scalery = RobustScaler()
y_scaled = scalery.fit_transform(y)
Call your statsmodel.OLS
on these scaled data.
While predicting, first transform your test data:
x_scaled_test = scalerx.transform([720.1])
Apply your regression model on this value and get the result. This result of y will be according to the scaled data.
Yield_scaled = b0 + b1 * x_scaled_test
So inverse transform it to get data in original units.
Yield_original = scalery.inverse_transform(Yield_scaled)
But in my opinion, this linear model will not give much accuracy, because when I plotted your data, this is the result.
This data will not be fitted with linear models. Use other techniques, or get more data.
Upvotes: 6