Reputation: 2305
How can I modify the below code to only fetch the last row in the table, specifically the value under the key
column? The reason is, it is a huge table and I need the last row, specifically the key value, to know how much it loaded thus far. I do not care about what other contents there are.
Line 1:
val df = spark.sqlContext.read.format("datasource").option("project", "character").option("apiKey", "xx").option("type", "tables").option("batchSize", "10000").option("database", "humans").option("table", "healthGamma").option("inferSchema", "true").option("inferSchemaLimit", "1").load()
Line 2:
df.createTempView("tables")
Line 3:
spark.sqlContext.sql("select * from tables").repartition(1).write.option("header","true").parquet("lifes_remaining")
Upvotes: 0
Views: 1862
Reputation: 125
you can use orderBy in a Dataframe like this, hope it helps:
df.orderBy($"value".desc).show(1)
Upvotes: 1