Reputation: 3672
Hi I am using a custom UDF to take square root of each value in each column.
square_root_UDF = udf(lambda x: math.sqrt(x), DoubleType())
for x in features:
dataTraining = dataTraining.withColumn(x, square_root_UDF(x))
Is there any faster way to get it done ? Polynomial expansion function is not suitable in this case.
Upvotes: 4
Views: 10008
Reputation: 51073
To add sqrt results as a column in scala you need to do the following:
import hc.implicits._
import org.apache.spark.sql.functions.sqrt
val dataTraining = dataTraining.withColumn("x_std", sqrt('x_variance))
Upvotes: 3
Reputation: 1517
In order to speed-up your calculation in this case
this is an example if you dataTraining is an RDD then
from pyspark.sql import SparkSession
from pyspark.sql.functions import sqrt
spark = SparkSession.builder.appName("SessionName") \
.config("spark.some.config.option", "some_value") \
.getOrCreate()
df = spark.createDataFrame(dataTraining)
for x in features:
df = df.withColumn(x, sqrt(x))
Upvotes: 0
Reputation: 91
Don't use UDF. Instead use built-in:
from pyspark.sql.functions import sqrt
for x in features:
dataTraining = dataTraining.withColumn(x, sqrt(x))
Upvotes: 4