Reputation: 191
I need to add sequence number to each row I am processing in a dataframe. But everytime when I add, we need to get the max of sequence from the existing rows and add + 1 and assign it to new row.
Any idea How we can achieve this with dataframe in spark scala.
Example.
row_id,emp_id, sal
1,11,2000
2,22,3000
3,33,5000
we need to get row id every time when we are inserting new data to the table by getting max(row_id) from the table and add +1 to it.
Please suggest any ideas.
Thanks,
Upvotes: 1
Views: 1786
Reputation: 462
Spark DataFrames are immutable so it is not possible to append / insert rows. Instead use union. Here's a quick solution to your problem. This is not a good solution since you need to perform union every time a new row is added.
val data = spark
.read
.option("inferSchema", "true")
.option("header", "true")
.csv("data.csv")
data.createOrReplaceTempView("dView")
val sqld = spark.sql("SELECT MAX(row_id)+1,SUM(emp_id),SUM(sal) FROM dView")
val finalD = data.union(sqld)
finalD.show()
spark.stop()
data.csv
row_id,emp_id, sal
1,11,2000
2,22,3000
Output:
+------+------+----+
|row_id|emp_id| sal|
+------+------+----+
| 1| 11|2000|
| 2| 22|3000|
| 3| 33|5000|
+------+------+----+
Upvotes: 1