Vanshaj Bhatia
Vanshaj Bhatia

Reputation: 77

Spark DataFrame to Google Cloud PubSub

I want to stream/ batch load data from a Spark DataFrame to the PubSub. I came across with some libraries like:

  1. Apache Bahir: Useful for Streaming data from PubSub only. https://bahir.apache.org/docs/spark/2.2.1/spark-streaming-pubsub/
  2. PubSub Lite Connector: Capable to writing to PubSub Lite, not sure if this works for PubSub.

Upvotes: 0

Views: 1572

Answers (1)

Sayan Bhattacharya
Sayan Bhattacharya

Reputation: 1368

You cannot use the Pub/Sub Lite connector for writing messages to Pub/Sub. Though Pub/Sub & Pub/Sub Lite both are horizontally scalable and managed messaging services but due to some differences these are two individual products.

You can refer to this documentation to check the differences between Pub/Sub and Pub/Sub Lite. From the doc:

Pub/Sub is usually the default solution for most application integration and analytics use cases.
Pub/Sub Lite is only recommended for applications where achieving extremely low cost justifies some additional operational work.

For stream/ batch load data from a Spark DataFrame to the Pub/Sub you can use Apache Bahir’s Pub/Sub connector.
You can find this example from Google Cloud Platform where Apache Bahir’s Spark Streaming connector for Google Cloud Pub/Sub has been used.

Upvotes: 0

Related Questions