Bab
Bab

Reputation: 191

Add sequence number to every row in a dataframe - Spark Scala

I need to add sequence number to each row I am processing in a dataframe. But everytime when I add, we need to get the max of sequence from the existing rows and add + 1 and assign it to new row.

Any idea How we can achieve this with dataframe in spark scala.

Example.

below is the existing data in a table:

row_id,emp_id, sal
1,11,2000
2,22,3000

Now I need to add new row as follows to the table:

3,33,5000

we need to get row id every time when we are inserting new data to the table by getting max(row_id) from the table and add +1 to it.

Please suggest any ideas.

Thanks,

Upvotes: 1

Views: 1786

Answers (1)

Minato
Minato

Reputation: 462

Spark DataFrames are immutable so it is not possible to append / insert rows. Instead use union. Here's a quick solution to your problem. This is not a good solution since you need to perform union every time a new row is added.

val data = spark
  .read
  .option("inferSchema", "true")
  .option("header", "true")
  .csv("data.csv")

data.createOrReplaceTempView("dView")
val sqld = spark.sql("SELECT MAX(row_id)+1,SUM(emp_id),SUM(sal) FROM dView")
val finalD = data.union(sqld)
finalD.show()
spark.stop()

data.csv

row_id,emp_id, sal
1,11,2000
2,22,3000

Output:

+------+------+----+
|row_id|emp_id| sal|
+------+------+----+
|     1|    11|2000|
|     2|    22|3000|
|     3|    33|5000|
+------+------+----+

Upvotes: 1

Related Questions