Reputation: 33
How can I generate an ID number in Spark SQL? In the python interface, Spark has the monotonically_increasing_id() function. But I do not know how to realize this function in SQL grammar. I want to create a table from an old table with some string operations and establish an id column for the new table.
Upvotes: 2
Views: 1445
Reputation: 332
I would say it is best to use Spark-SQL syntax for the auto-increment:
The syntax is as following: GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY [ ( [ START WITH start ] [ INCREMENT BY step ] ) ]
The start with and increment values have defaults so you only need this:
Example:
create table gen1 (
id long GENERATED ALWAYS AS IDENTITY
, t string
)
This works on Delta tables in Databricks for instance. There it needs runtime version 10.4 or above.
--Edited
Sources:
[1] https://community.databricks.com/t5/data-engineering/is-delta-table-with-auto-increment-column-as-unique-identifier/td-p/17768
[2] Auto increment id in delta table while inserting
Upvotes: 1
Reputation: 19
this should use spark.sql ID bigint GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1) from databricks spark sql
Upvotes: 0