YFl
YFl

Reputation: 1379

Spark: what UUID version is used in the built-in uuid() function implementation?

SparkSQL has the uuid() SQL built-in function.

However, neither the documentation states the UUID version nor I could find the source code, after a quick search.

I can assume that it is likely to be UUID V4.

What is the version used to implement it?

Thanks.


Bonus question: Where is it implemented in the source code? I would be happy to see it.

Upvotes: 1

Views: 923

Answers (1)

M_S
M_S

Reputation: 3733

I am not sure but when i am running this sample select (SELECT uuid();) in query details i can see this

(2) Project [codegen id : 1]
Output [1]: [uuid(Some(-1736932742140897221)) AS uuid()#8]
Input: []

In Spark repo UUID expression is defined in misc.scala

  :
  usage = """_FUNC_() - Returns an universally unique identifier (UUID) string. The value is returned as a canonical UUID 36-character string.""",
  examples = """
    Examples:
      > SELECT _FUNC_();
       46707d92-02f4-4817-8116-a4c3b23e6266
  """,
  note = """
    The function is non-deterministic.
  """,
  since = "2.3.0",
  group = "misc_funcs")
  :

and is using RandomUUIDGenerator

...which provides further details about the algorithm:

  • For the algorithm, see RFC 4122: A Universally Unique IDentifier (UUID) URN Namespace, * section 4.4 "Algorithms for Creating a UUID from Truly Random or Pseudo-Random Numbers".

And from the above document we can see that Spark's implementation complies to UUID v4:

4.4. Algorithms for Creating a UUID from Truly Random or Pseudo-Random Numbers

The version 4 UUID is meant for generating UUIDs from truly-random or pseudo-random numbers.

The algorithm is as follows:

o Set the two most significant bits (bits 6 and 7) of the clock_seq_hi_and_reserved to zero and one, respectively.

o Set the four most significant bits (bits 12 through 15) of the time_hi_and_version field to the 4-bit version number from Section 4.1.3.

o Set all the other bits to randomly (or pseudo-randomly) chosen values.

Upvotes: 3

Related Questions