Reputation: 15258
In pySpark, I have a dataframe df
as follow:
Site A B
1 3 83
1 16 26
1 98 46
1 80 14
1 83 54
2 0 83
2 75 67
2 72 24
2 60 13
6 40 50
6 34 60
6 36 39
6 68 6
6 91 51
6 81 82
or the other side, I have a dictionnary myDict
of functions g = {1 : f1, 2 : f2, 3: f3, 6:f6}
I want to use the dictionnary to generate a new column. Something like :
df.withColumn("MyCol", myDict[df.Site](df.A, df.B))
But when I do that like this, I receive the error:
unhashable type: 'Column'
Traceback (most recent call last):
TypeError: unhashable type: 'Column'
How should I write it ?
Upvotes: 0
Views: 1274
Reputation: 4442
You want to use Currying.
The withColumn
function only accepts existant columns in the same dataframe as arguments or a literal through the lit()
function (lit
actually returns a column).
In order to pass in extra parameters you must use a higher-order function that returns a udf
:
class MyUDFs():
@staticmethod
def trans(myDict):
def cb(Site,A,B):
return myDict[Site](A, B)
return udf(cb, StringType())
df = df.withColumn("MyCol",MyUDFs.trans(myDict)(df["Site"],df["A"],df["B"]))
Upvotes: 1