Runtime error using MLFlow and Spark on databricks

Here is some model I created:

class SomeModel(mlflow.pyfunc.PythonModel):
    def predict(self, context, input):
        # do fancy ML stuff
        # log results
        pandas_df = pd.DataFrame(...insert predictions here...)
        spark_df = spark.createDataFrame(pandas_df)
        spark_df.write.saveAsTable('tablename', mode='append')

I'm trying to log my model in this manner by calling it later in my code:

with mlflow.start_run(run_name="SomeModel_run"):
    model = SomeModel()
    mlflow.pyfunc.log_model("somemodel", python_model=model)

Unfortunately it gives me this Error Message:

RuntimeError: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.

The error is caused because of the line mlflow.pyfunc.log_model("somemodel", python_model=model), if I comment it out my model will make its predictions and log the results in my table.

Alternatively, removing the lines in my predict function where I call spark to create a dataframe and save the table, I am able to log my model.

How do I go about resolving this issue? I need my model to not only write to the table but also be logged

Upvotes: 5

Answers (2)

Gabriel Ceron Viveros

Reputation: 41

If your objective is to log the output of the model in an endpoint, Databricks released a monitoring feature called Inference Tables. When activating them it will log the request and the response of an endpoint. Then you can perform transformations on those tables to supervise the model performance, generate statistics, check model/data drifts, and even create dashboards, there are some examples in the Databricks documentation.

Upvotes: 0

S J

Reputation: 1

I had a similar error and I was able to resolve this by taking spark functions like "spark.createDataFrame(pandas_df)" outside of the class. If you want to read-write data using spark do it within the main function.

    class SomeModel(mlflow.pyfunc.PythonModel):
       def predict(self, context, input):
       # do fancy ML stuff
       # log results
       return predictions

    with mlflow.start_run(run_name="SomeModel_run"):
      model = SomeModel()
      pandas_df = pd.DataFrame(...insert predictions here...)
      mlflow.pyfunc.log_model("somemodel", python_model=model)

Upvotes: 0

Runtime error using MLFlow and Spark on databricks

Answers (2)

Related Questions