RichaDwivedi
RichaDwivedi

Reputation: 343

Scala MurmurHash3 library not matching Spark Hash function

Scala MurmurHash3 library not matching Spark Hash function Both scala and spark uses same Murmur hash 3 implementation but results are different. Any idea?

Upvotes: 3

Views: 2057

Answers (1)

RichaDwivedi
RichaDwivedi

Reputation: 343

I found a way to match a string in scala which is the same spark hash -

As spark uses Guava's implementation of Murmur3_x86_32 we can simply write tas below to match a string -

Seed Value used in spark = 42

String format = UTF8

import org.apache.spark.unsafe.types.UTF8String
import org.apache.spark.unsafe.hash.Murmur3_x86_32._

 

   val s = UTF8String.fromString("Formatted String Goes Here")
   
   hashUnsafeBytes(s.getBaseObject, s.getBaseOffset, s.numBytes(), 42.toInt)

which returns the same Hash code as in spark hash function.

Upvotes: 4

Related Questions