Reputation: 1379
SparkSQL has the uuid()
SQL built-in function.
However, neither the documentation states the UUID version nor I could find the source code, after a quick search.
I can assume that it is likely to be UUID V4.
What is the version used to implement it?
Thanks.
Bonus question: Where is it implemented in the source code? I would be happy to see it.
Upvotes: 1
Views: 923
Reputation: 3733
I am not sure but when i am running this sample select (SELECT uuid();) in query details i can see this
(2) Project [codegen id : 1]
Output [1]: [uuid(Some(-1736932742140897221)) AS uuid()#8]
Input: []
In Spark repo UUID expression is defined in misc.scala
:
usage = """_FUNC_() - Returns an universally unique identifier (UUID) string. The value is returned as a canonical UUID 36-character string.""",
examples = """
Examples:
> SELECT _FUNC_();
46707d92-02f4-4817-8116-a4c3b23e6266
""",
note = """
The function is non-deterministic.
""",
since = "2.3.0",
group = "misc_funcs")
:
and is using RandomUUIDGenerator
...which provides further details about the algorithm:
- For the algorithm, see RFC 4122: A Universally Unique IDentifier (UUID) URN Namespace, * section 4.4 "Algorithms for Creating a UUID from Truly Random or Pseudo-Random Numbers".
And from the above document we can see that Spark's implementation complies to UUID v4:
4.4. Algorithms for Creating a UUID from Truly Random or Pseudo-Random Numbers
The version 4 UUID is meant for generating UUIDs from truly-random or pseudo-random numbers.
The algorithm is as follows:
o Set the two most significant bits (bits 6 and 7) of the clock_seq_hi_and_reserved to zero and one, respectively.
o Set the four most significant bits (bits 12 through 15) of the time_hi_and_version field to the 4-bit version number from Section 4.1.3.
o Set all the other bits to randomly (or pseudo-randomly) chosen values.
Upvotes: 3