Reputation: 425
I am trying to run the script below in PySpark3 and receiving the error message that follows. I am using this has something to do with formatting but I am not sure how to go about doing so. Any help would be much appreciated.
train,test = df.randomSplit([0.7,0.3])
models = ["LinearRegression()","DecisionTreeRegressor()","RandomForestRegressor()","GBTRegressor()"]
for model in models:
# Fit our model
M = model
fitModel = M.fit(train)
# Load the Summary
trainingSummary = fitModel.summary
# trainingSummary.residuals.show()
print("Training RMSE: %f" % trainingSummary.rootMeanSquaredError)
print("Training r2: %f" % trainingSummary.r2)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-15-d0b941a7170e> in <module>()
8 # Fit our model
9 M = model
---> 10 fitModel = M.fit(train)
11
12 # Load the Summary
AttributeError: 'str' object has no attribute 'fit'
Upvotes: 0
Views: 1751
Reputation: 425
I think this way is actually more efficient....
This way you can still iterate through a list.
def ClassTrainEval(model):
fitModel = model.fit(train)
# Load the Summary
trainingSummary = fitModel.summary
print("Training RMSE: %f" % trainingSummary.rootMeanSquaredError)
print("Training r2: %f" % trainingSummary.r2)
models = [LogisticRegression(),NaiveBayes(),OneVsRest(),LinearSVC()]
for model in models:
ClassTrainEval(classifier)
Upvotes: 1
Reputation: 1005
You have to modify your models
models = [LinearRegression, DecisionTreeRegressor, RandomForestRegressor, GBTRegressor]
Because with your current define, they are definitely a string. Better way is to add modules inside list and instance it every loop.
To be precies models = ["LinearRegression()"]
the element inside is String, not an object which doesn't have any method.
Upvotes: 1