Steven
Steven

Reputation: 15258

use dict of function in UDF

In pySpark, I have a dataframe df as follow:

Site    A   B
1       3   83
1       16  26
1       98  46
1       80  14
1       83  54
2       0   83
2       75  67
2       72  24
2       60  13
6       40  50
6       34  60
6       36  39
6       68  6
6       91  51
6       81  82

or the other side, I have a dictionnary myDict of functions g = {1 : f1, 2 : f2, 3: f3, 6:f6}

I want to use the dictionnary to generate a new column. Something like : df.withColumn("MyCol", myDict[df.Site](df.A, df.B))

But when I do that like this, I receive the error:

unhashable type: 'Column'

Traceback (most recent call last):

TypeError: unhashable type: 'Column'

How should I write it ?

Upvotes: 0

Views: 1274

Answers (1)

TMichel
TMichel

Reputation: 4442

You want to use Currying.

The withColumn function only accepts existant columns in the same dataframe as arguments or a literal through the lit() function (lit actually returns a column).

In order to pass in extra parameters you must use a higher-order function that returns a udf:

class MyUDFs():
    @staticmethod
    def trans(myDict):
        def cb(Site,A,B):
            return myDict[Site](A, B)
        return udf(cb, StringType())

df = df.withColumn("MyCol",MyUDFs.trans(myDict)(df["Site"],df["A"],df["B"]))

Upvotes: 1

Related Questions