Niket Arora
Niket Arora

Reputation: 71

Read protobuf kafka message using spark structured streaming

Is it possible to read protobuf message from kafka using spark structured streaming?

Upvotes: 1

Views: 2502

Answers (1)

Niket Arora
Niket Arora

Reputation: 71

Approach 1

sparkSession.udf().register("deserialize", getDeserializer(), schema);

    DataStreamReader dataStreamReader = sparkSession.readStream().format("kafka");

    for (Map.Entry<String, String> kafkaPropEntry : kafkaProps.entrySet()) {
        dataStreamReader.option(kafkaPropEntry.getKey(), kafkaPropEntry.getValue());
    }

    Dataset<Row> kafkaRecords =
            dataStreamReader.load()
                    .selectExpr("deserialize(value) as event").select("event.*");

Approach 2

final StructType schema = getSchema();

    DataStreamReader dataStreamReader = sparkSession.readStream().format("kafka");

    for (Map.Entry<String, String> kafkaPropEntry : kafkaProps.entrySet()) {
        dataStreamReader.option(kafkaPropEntry.getKey(), kafkaPropEntry.getValue());
    }

    Dataset<Row> kafkaRecords = dataStreamReader.load()
            .map(row -> getOutputRow((byte[]) row.get(VALUE_INDEX)), RowEncoder.apply(schema))

Approach 1 has one flaw as deserialize method is called multiple times (for evert column in event) https://issues.apache.org/jira/browse/SPARK-17728. Approach 2 maps protobuf to row directly using map method.

Upvotes: 2

Related Questions