Reputation: 5126
I wanted to add a column based on existing columns in a pyspark dataframe.
I can add a column using pandas as
transform_df = transform_df.withColumn('geohash', transform_df.apply(lambda x: pgh.encode(x.lat, x.lng, precision=9)))
How can I add in spark? I used the following but have some errors where user defined function cannot have more than one arg:
some_udf = F.udf(lambda x: pgh.encode(x.lat, x.lng, precision=9))
transform_df = transform_df.withColumn('geohash',
some_udf(F.col(transform_df['lat'], transform_df['lng'])))
Upvotes: 0
Views: 256
Reputation: 215137
Since your UDF expects input from two different columns, your lambda function also needs to have two parameters:
some_udf = F.udf(lambda lat, lng: pgh.encode(lat, lng, precision=9))
# ^^^ ^^^ two parameters corresponding to two input columns below
transform_df = transform_df.withColumn('geohash', some_udf(transform_df['lat'], transform_df['lng']))
Upvotes: 1