Why need to add columns in dimensionsSpec when creating Druid schema

Question

I tried to create my Druid schema, and I refereed a example as following:

{"dimensionsSpec": {"dimensions": ["timestamp”,"netname"] },
 "columns":  ["second_time","timestamp"],
 "delimiter": "/001"
}

My question is that, if I indicated dimensions, why should I indicate columns again. Btw, should I put timestamp(it is seconds) in the dimension? since my granularity is MINUTE.

Jainik · Accepted Answer

There is no need to specify columns attribute in your ingestion spec. dimensionSpec and metricsSpec are enough. here's the sample example of ingestion spec:

"dimensionsSpec" : {
    "dimensions": [
      "srcIP",
      { "name" : "srcPort", "type" : "long" },
      { "name" : "dstIP", "type" : "string" },
      { "name" : "dstPort", "type" : "long" },
      { "name" : "protocol", "type" : "string" }
    ]
  }

Druid has excellent documentation, here are good reference links about how to write ingestion spec: Writing Druid Ingestion Spec, Imply Ingestion Spec Docs

Answer to your 2nd question:

There is no need to include timestamp in dimension list. To specify granularity you can use granularitySpec. Here's the example:

"granularitySpec" : {
    "type" : "uniform",
    "segmentGranularity" : "HOUR",
    "queryGranularity" : "MINUTE"
    "rollup" : true
}

Note that there are two types of granularity you can specify here, segmentGranularity refers to what size of time interval should a single segment contain data for and queryGranularity is used while querying to druid table

Why need to add columns in dimensionsSpec when creating Druid schema

Answers (1)

Related Questions