Scala Spark read last row under specific column only

Question

How can I modify the below code to only fetch the last row in the table, specifically the value under the key column? The reason is, it is a huge table and I need the last row, specifically the key value, to know how much it loaded thus far. I do not care about what other contents there are.

Line 1:

val df = spark.sqlContext.read.format("datasource").option("project", "character").option("apiKey", "xx").option("type", "tables").option("batchSize", "10000").option("database", "humans").option("table", "healthGamma").option("inferSchema", "true").option("inferSchemaLimit", "1").load()

Line 2:

df.createTempView("tables")

Line 3:

spark.sqlContext.sql("select * from tables").repartition(1).write.option("header","true").parquet("lifes_remaining")

Yash Shah · Accepted Answer

you can use orderBy in a Dataframe like this, hope it helps:

df.orderBy($"value".desc).show(1)

Scala Spark read last row under specific column only

Answers (1)

Related Questions