anantmf
anantmf

Reputation: 66

Apache Beam Streaming data from Kafka to GCS Bucket (Not using pubsub)

I have seen lot of examples of Apache Beam where you read data from PubSub and write to GCS bucket, however is there any example of using KafkaIO and writing it to GCS bucket? Where I can parse the message and put it in appropriate bucket based on the message content?

For e.g.

message = {type="type_x", some other attributes....}
message = {type="type_y", some other attributes....}

type_x --> goes to bucket x
type_y --> goes to bucket y

My usecase is streaming data from Kafka to GCS bucket, so if someone suggest some better way to do it in GCP its welcome too.

Thanks. Regards, Anant.

Upvotes: 2

Views: 608

Answers (2)

Khalid K
Khalid K

Reputation: 356

You can use Secor to load messages to a GCS bucket. Secor is also able to parse incoming messages and puts them under different paths in the same bucket.

Upvotes: 1

Jayadeep Jayaraman
Jayadeep Jayaraman

Reputation: 2825

You can take a look at the example present here - https://github.com/0x0ece/beam-starter/blob/master/src/main/java/com/dataradiant/beam/examples/StreamWordCount.java

Once you have read the data elements if you want to write to multiple destinations based on a specific data value you can look at multiple outputs using TupleTagList the details of which can be found here - https://beam.apache.org/documentation/programming-guide/#additional-outputs

Upvotes: 0

Related Questions