Specifying "basePath" option in Spark Structured Streaming

Question

Is it possible to set the basePath option when reading partitioned data in Spark Structured Streaming (in Java)? I want to load only the data in a specific partition, such as basepath/x=1/, but I also want x to be loaded as a column. Setting basePath the way I would for a non-streaming dataframe doesn't seem to work.

Here's a minimal example. I have a dataframe containing the following data:

+---+---+
|  a|  b|
+---+---+
|  1|  2|
|  3|  4|
+---+---+

I wrote this as a Parquet file to a subdirectory named x=1.

The following code (with a regular non-streaming dataframe) works fine:

Dataset data = sparkSession.read()
  .option("basePath", basePath)
  .parquet(basePath + "/x=1");

data.show();

This produces the expected result:

+---+---+---+
|  a|  b|  x|
+---+---+---+
|  1|  2|  1|
|  3|  4|  1|
+---+---+---+

However, the following (using the Structured Streaming API) doesn't work:

StructType schema = data.schema(); // data as defined above

Dataset streamingData = sparkSession.readStream()
  .schema(schema)
  .option("basePath", basePath)
  .parquet(basePath + "/x=1");

streamingData.writeStream()
  .trigger(Trigger.Once())
  .format("console")
  .start().awaitTermination();

The dataframe, in this case, doesn't contain any rows:

+---+---+---+
|  a|  b|  x|
+---+---+---+
+---+---+---+

Specifying "basePath" option in Spark Structured Streaming

Answers (1)

Related Questions

Specifying &quot;basePath&quot; option in Spark Structured Streaming

Answers (1)

Related Questions

Specifying "basePath" option in Spark Structured Streaming