Bentech
Bentech

Reputation: 498

Why PySpark execute only the default statement in my custom `SQLTransformer`

I wrote a custom SQLTransformer in PySpark. And setting a default SQL statement is mandatory to have the code being executed. I can save the custum transformer within Python, load it and execute it using Scala or/and Python but only the default statement is executed despite the fact that there is something else in the _transform method. I have the same result for both languages, then the problem is not related to _to_java method or JavaTransformer class.

class filter(SQLTransformer): 
    def __init__(self):
        super(filter, self).__init__() 
        self._setDefault(statement = "select text, label from __THIS__") 

    def _transform(self, df): 
        df = df.filter(df.id > 23)
        return df

Upvotes: 0

Views: 119

Answers (1)

user10648740
user10648740

Reputation:

Such information flow is not supported. To create a Tranformer that can be used with both Python and Scala code base you have:

  • Implement Java or Scala Transformer, in your case extending org.apache.spark.ml.feature.SQLTransformer.
  • Add Python wrapper extending pyspark.sql.ml.wrapper.JavaTransformer the same way as pyspark.sql.ml.feature.SQLTransformer and interface JVM counterpart from it.

Upvotes: 1

Related Questions