What hash algorithm is used in pyspark.sql.functions.hash?

Question

I have a simple question for PySpark hash function.

I have checked that in Scala, Spark uses murmur3hash based on Hash function in spark.

Could anyone answer this question? I also want to know the code that says the algorithm used in PySpark hash function.

mck · Accepted Answer

Pyspark is just a wrapper around the Scala Spark code. I believe it uses the same hash function as in Scala Spark.

In your link to the source code, you can see that it calls sc._jvm.functions.hash, which essentially points to the equivalent function in the Scala source code (inside the "JVM").

Answers (2)