Reputation: 343
Scala MurmurHash3 library not matching Spark Hash function Both scala and spark uses same Murmur hash 3 implementation but results are different. Any idea?
Upvotes: 3
Views: 2057
Reputation: 343
I found a way to match a string in scala which is the same spark hash -
As spark uses Guava's implementation of Murmur3_x86_32 we can simply write tas below to match a string -
Seed Value used in spark = 42
String format = UTF8
import org.apache.spark.unsafe.types.UTF8String
import org.apache.spark.unsafe.hash.Murmur3_x86_32._
val s = UTF8String.fromString("Formatted String Goes Here")
hashUnsafeBytes(s.getBaseObject, s.getBaseOffset, s.numBytes(), 42.toInt)
which returns the same Hash code as in spark hash function.
Upvotes: 4