sunil
sunil

Reputation: 1279

can I join two dataSources and create a new dataSource permanently in druid

Druid now supports joins. But I see still its lil slower when we join a big fact table with a mid size dimension table. Can we do the join and create new dataSource and store the resultant dataSource for further query in druid? if so how can we do it.

I followed Druid documentation but could not find reference to this. Appreciate any info on this.

Thanks

Upvotes: 2

Views: 1600

Answers (1)

58k723f1
58k723f1

Reputation: 619

Yes, you can create a native index job, which reads data from druid and puts it in a new data source.

You can use the "combining" inputSource which allows you to read data from multiple places.

The inputSource would be something like this:

...
"ioConfig": {
  "type": "index_parallel",
  "inputSource": {
    "type": "combining",
    "delegates" : [
     {
      "type": "druid",
      "dataSource": "dataSource1",
      "interval": "2021-01-01/2021-01-02"
     },
     {
      "type": "druid",
      "dataSource": "dataSource2",
      "interval": "2021-01-01/2021-01-02"
     }
    ]
  }
},
...

See this pages for more information:

https://druid.apache.org/docs/latest/ingestion/native-batch-input-sources.html#combining-input-sources

https://druid.apache.org/docs/latest/ingestion/native-batch-input-sources.html#druid-input-source

Upvotes: 2

Related Questions