Reputation: 477
I have started exploring apache pinot, there are few query regarding schema of apache pinot. I want to understand how apache pinot work with Kafka topic that has AVRO schema (schema includes nested object, array of object etc..) because i didn't find any resource or example that shows how we can inject data with Kafka that has avro schema with it.
As per my understanding apache pinot we have to provide flat schema or other option for nested Json object we can use transform function. Is there any kind of Kafka connect for pinot for doing data injection?
Avro schema
{
"namespace" : "my.avro.ns",
"name": "MyRecord",
"type" : "record",
"fields" : [
{"name": "uid", "type": "int"},
{"name": "somefield", "type": "string"},
{"name": "options", "type": {
"type": "array",
"items": {
"type": "record",
"name": "lvl2_record",
"fields": [
{"name": "item1_lvl2", "type": "string"},
{"name": "item2_lvl2", "type": {
"type": "array",
"items": {
"type": "record",
"name": "lvl3_record",
"fields": [
{"name": "item1_lvl3", "type": "string"},
{"name": "item2_lvl3", "type": "string"}
]
}
}}
]
}
}}
]
}
Kafka Avro Message:
{
"uid": 29153333,
"somefield": "somevalue",
"options": [
{
"item1_lvl2": "a",
"item2_lvl2": [
{
"item1_lvl3": "x1",
"item2_lvl3": "y1"
},
{
"item1_lvl3": "x2",
"item2_lvl3": "y2"
}
]
}
]
}
Upvotes: 0
Views: 1298
Reputation: 21
You don't need a separate connector to ingest data into Pinot from Kafka, or other stream systems such as Kinesis, Apache Pulsar. You simply configure the Pinot table to point to stream source (Kafka broker in your case), along with any transformations you may want to map Kafka schema (Avro or otherwise) to schema in Pinot.
How you should store the data in Pinot (table schema in Pinot) is more a function of how you want to query it.
If you are only interested in a particular field inside your nested filed, you can configure a simple ingestion transform to extract that field out during ingestion and store it as a column in Pinot.
If you want to preserve the entire nested JSON blob for a column, and then query the blob, then you can use JSON indexing.
Here are some pointers to for your reference:
You may also want to consider joining the Apache Pinot slack community for Apache Pinot related questions.
Upvotes: 2