Polynomial regression predicted values as dataframe (Python)

Question

A lot of questions is answered regarding this, however, I could not figure out one thing.

I have a dataframe and I am performing regression,after that the results are stored in the new columns in Test dataframe. To compare methods I need to do both linear and polynomial regression.

I have found a way to beautifully do this with linear regression, where in result I have predicted values in new column of dataframe Test. But I cannot make this work within the same loop using polynomial regression, cause in the final Test dataframe I have multiple Null values as in the step of model_2.fit_transform(X) values somehow loses the corresponding Test index.

import pandas as pd
import statsmodels.api as sm
from sklearn.preprocessing import PolynomialFeatures

Test = pd.read_csv(r'D:\myfile.csv')

df_coef =[]
value = list(set(Test['Value']))
for value in value:
    df_redux = Test[Test['Value'] == value]

    Y = df_redux['Y']
    X = df_redux[['X1', 'A', 'B', 'B']]
    X = sm.add_constant(X)

    # linear
    model_1 = sm.OLS(Y, X).fit()
    predictions_1 = model_1.predict(X)

    # polynomial
    polynomial_features = PolynomialFeatures(degree=2)
    xp = polynomial_features.fit_transform(X)
    model_2 = sm.OLS(Y, xp).fit()
    predictions_2 = model_2.predict(xp)

    stats_1 = pd.read_html(model_1.summary().tables[1].as_html(), header=0, index_col=0)[0]
    stats_2 = pd.read_html(model_2.summary().tables[1].as_html(), header=0, index_col=0)[0]

    predictions_1 = pd.DataFrame(predictions_1, columns=['lin'])
    predictions_2 = pd.DataFrame(predictions_2, columns=['poly'])

    # ??? how to concat and appen both prediction_1 and prediction_2 in the same df_coef = [] dataframe?
    gf = pd.concat([predictions_1, df_redux], axis=1)
    df_coef.append(gf)

all_coef = pd.concat(df_coef)

type(all_coef)
Out[234]: pandas.core.frame.DataFrame

The problem is that tranformed xp type is , but X type is . The question is how can I get the polynomial regression predicted values in new column of Test, next to linear reg. results. This is probably really simple, but I could not figure it out.

print(type(X))
print(type(xp))
print(X.sample(2))
print()
print(xp)


      X1         A          B          G1
962    4.334912  1.945910  3.135494  3.258097
1365   4.197888  2.197225  3.135494  3.332205
[[ 1.          4.77041663  1.94591015 ... 35.74106743 34.52550933
  33.35129251]
 [ 1.          4.43240629  1.94591015 ... 33.28387641 32.03140262
  30.82605947]
 [ 1.          3.21669428  1.94591015 ... 29.95821572 30.38903979
  30.82605947]

The result which I get with polynominal reg. predicted values appended to original Test dataframe:

0     6.178542     3.0  692  ...  2.079442  4.783216  6.146329
1     6.156108    11.0  692  ...  2.197225  4.842126  6.113682
2     6.071453    12.0  692  ...  2.197225  4.814595  6.052089
3     5.842053     NaN        NaN  ...       NaN       NaN       NaN
4     4.625762    30.0  692  ...  1.945910  5.018201  5.828946

This is the correct and good result I obtained using only linear regression, without Nan and with value in each row, how it supposed to be:

0     6.151675     3  692  5  ...  3.433987  2.079442  4.783216  6.146329
1     6.132077    11  692  5  ...  3.401197  2.197225  4.842126  6.113682
2     6.068450    12  692  5  ...  3.332205  2.197225  4.814595  6.052089
4     5.819535    30  692  5  ...  3.258097  1.945910  5.018201  5.828946
8     4.761362    61  692  5  ...  2.564949  1.945910  3.889585  4.624973

g123456k · Accepted Answer

Solve this by adding a line for numpy to series tranformation. And for model statistics statsmodels summary:

import pandas as pd
from statsmodels.api import OLS

predictions_2 = model_2.predict(xp)
predictions_2_series = pd.Series(predictions_2, index=df_redux.index.values)

print(OLS(Y, xp).fit().summary())

Polynomial regression predicted values as dataframe (Python)

Answers (1)

Related Questions