WestCoastProjects
WestCoastProjects

Reputation: 63221

What are the available output formats for writeStream in Spark structured streaming

Consider a generic writeStream invocation - with the typical "console" output format:

out.writeStream
  .outputMode("complete")
  .format("console")
  .start()

What are the alternatives? I noticed actually that the default is parquet:

In DataStreamWriter:

  /**
   * Specifies the underlying output data source.
   *
   * @since 2.0.0
   */
  def format(source: String): DataStreamWriter[T] = {
    this.source = source
    this
  }

  private var source: String = df.sparkSession.sessionState.conf.defaultDataSourceName

In SQLConf:

  def defaultDataSourceName: String = getConf(DEFAULT_DATA_SOURCE_NAME)

  val DEFAULT_DATA_SOURCE_NAME = buildConf("spark.sql.sources.default")
    .doc("The default data source to use in input/output.")
    .stringConf
    .createWithDefault("parquet")

But then how is the path for the parquet file specified? What are the other formats supported and what options do they have/require?

Upvotes: 5

Views: 23322

Answers (2)

vatsal mevada
vatsal mevada

Reputation: 5636

Here is the official spark documentation for the same: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#output-sinks

As of spark 2.4.1, five formats are supported out of the box:

  • File sink
  • Kafka sink
  • Foreach sink
  • Console sink
  • Memory sink

On top of that one can also implement her custom sink by extending Sink API of Spark: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Sink.scala

Upvotes: 4

WestCoastProjects
WestCoastProjects

Reputation: 63221

I found one reference: https://community.hortonworks.com/questions/89282/structured-streaming-writestream-append-to-file.html

It seems that option("path",path) can be used:

enter image description here

Upvotes: 2

Related Questions