Feng Yu
Feng Yu

Reputation: 33

How to create an auto increment id column using Spark (pure SQL no Python)

How can I generate an ID number in Spark SQL? In the python interface, Spark has the monotonically_increasing_id() function. But I do not know how to realize this function in SQL grammar. I want to create a table from an old table with some string operations and establish an id column for the new table.

Upvotes: 2

Views: 1445

Answers (3)

Marja van der Wind
Marja van der Wind

Reputation: 332

I would say it is best to use Spark-SQL syntax for the auto-increment:

The syntax is as following: GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY [ ( [ START WITH start ] [ INCREMENT BY step ] ) ]

The start with and increment values have defaults so you only need this:

Example:

create table gen1 (
     id long GENERATED ALWAYS AS IDENTITY
   , t string
)

This works on Delta tables in Databricks for instance. There it needs runtime version 10.4 or above.

--Edited

Sources: [1] https://community.databricks.com/t5/data-engineering/is-delta-table-with-auto-increment-column-as-unique-identifier/td-p/17768
[2] Auto increment id in delta table while inserting

Upvotes: 1

thrombi
thrombi

Reputation: 19

this should use spark.sql ID bigint GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1) from databricks spark sql

Upvotes: 0

user9276715
user9276715

Reputation:

This does the trick I hope:

ROW_NUMBER() over (ORDER BY myColumn)

Upvotes: 2

Related Questions