Reputation: 6216
I have a function, which trys to pass a broadcast variable into UDF.
The function looks like:
def generate_lookup_code(self, lookup_map):
lookup_map_broadcast = spark_session.sparkContext.broadcast(lookup_map)
print("lookup_map has been broadcasted")
#### UDF function only return a constant string###
def _generate_code(bc_reasoncode_lookup_map):
reasoncode_lookup_map = bc_reasoncode_lookup_map.value
return "hello"
udfGenerateCode = F.udf(_generate_code, StringType())
input_df = input_df.withColumn('code', udfGenerateCode(lookup_map_broadcast))
input_df.show()
My intention is only trying to pass the broadcast variable to the UDF, however, I got the error:
'Broadcast' object has no attribute '_get_object_id'
I have no idea where is wrong?
Upvotes: 1
Views: 6287
Reputation: 1182
You don't need to pass a broadcasted variable as a UDF argument, just reference it from the function:
lookup_map_broadcast = spark_session.sparkContext.broadcast(lookup_map)
def _generate_code():
reasoncode_lookup_map = lookup_map_broadcast.value
return "hello"
udfGenerateCode = F.udf(_generate_code, StringType())
input_df = input_df.withColumn('code', udfGenerateCode())
A UDF is called for each row and it can accept either a column or literal.
Upvotes: 4