BigD
BigD

Reputation: 888

How to collect a streaming dataset (to a Scala value)?

How can I store a dataframe value to a scala variable ?

I need to store values from the below dataframe (assuming column "timestamp" producing same values) to a variable and later I need to use this variable somewhere

i have tried following

   val spark =SparkSession.builder().appName("micro").
   enableHiveSupport().config("hive.exec.dynamic.partition", "true").
   config("hive.exec.dynamic.partition.mode", "nonstrict").
   config("spark.sql.streaming.checkpointLocation", "hdfs://dff/apps/hive/warehouse/area.db").
   getOrCreate()

   val xmlSchema = new StructType().add("id", "string").add("time_xml", "string")
   val xmlData = spark.readStream.option("sep", ",").schema(xmlSchema).csv("file:///home/shp/sourcexml") 
   val xmlDf_temp = xmlData.select($"id",unix_timestamp($"time_xml", "dd/mm/yyyy HH:mm:ss").cast(TimestampType).as("timestamp"))
   val collect_time = xmlDf_temp.select($"timestamp").as[String].collect()(0)

its thorwing error saying following:

org.apache.spark.sql.AnalysisException: Queries with streaming sources must be executed with writeStream.start()

Is there any way i can store some dataframe values to a variable and use later?

Upvotes: 0

Views: 710

Answers (1)

Jacek Laskowski
Jacek Laskowski

Reputation: 74599

is there any way i can store some dataframe values to a variable and use later ?

That's not possible in Spark Structured Streaming since a streaming query never ends and so it is not possible to express collect.

and later I need to use this variable somewhere

This "later" has to be another streaming query that you could join together and produce a result.

Upvotes: 1

Related Questions