theNextBigThing
theNextBigThing

Reputation: 131

Apache Druid : Issue while updating the data in Datasource

I am currently using the druid-Incubating-0.16.0 version. As mentioned in https://druid.apache.org/docs/latest/tutorials/tutorial-update-data.html tutorial link, we can use combining firehose to update and merge the data for a data source.

Step: 1 I am using the same sample data with the initial structure as

┌──────────────────────────┬──────────┬───────┬────────┐
│ __time                   │ animal   │ count │ number │
├──────────────────────────┼──────────┼───────┼────────┤
│ 2018-01-01T01:01:00.000Z │ tiger    │     1 │    100 │
│ 2018-01-01T03:01:00.000Z │ aardvark │     1 │     42 │
│ 2018-01-01T03:01:00.000Z │ giraffe  │     1 │  14124 │
└──────────────────────────┴──────────┴───────┴────────┘

Step 2: I updated the data for tiger with {"timestamp":"2018-01-01T01:01:35Z","animal":"tiger", "number":30} with appendToExisting = false and rollUp = true and found the result

┌──────────────────────────┬──────────┬───────┬────────┐
│ __time                   │ animal   │ count │ number │
├──────────────────────────┼──────────┼───────┼────────┤
│ 2018-01-01T01:01:00.000Z │ tiger    │     2 │    130 │
│ 2018-01-01T03:01:00.000Z │ aardvark │     1 │     42 │
│ 2018-01-01T03:01:00.000Z │ giraffe  │     1 │  14124 │
└──────────────────────────┴──────────┴───────┴────────┘

Step 3: Now i am updating giraffe with {"timestamp":"2018-01-01T03:01:35Z","animal":"giraffe", "number":30} with appendToExisting = false and rollUp = true and getting the following result

┌──────────────────────────┬──────────┬───────┬────────┐
│ __time                   │ animal   │ count │ number │
├──────────────────────────┼──────────┼───────┼────────┤
│ 2018-01-01T01:01:00.000Z │ tiger    │     1 │    130 │
│ 2018-01-01T03:01:00.000Z │ aardvark │     1 │     42 │
│ 2018-01-01T03:01:00.000Z │ giraffe  │     2 │  14154 │
└──────────────────────────┴──────────┴───────┴────────┘

My doubt is, In step 3 the count of the tiger is getting decreased by 1 but I think it should not be changed since there are no changes in step 3 for tiger and there is no number change also

FYI, count and number are metricSpec and they are count and longSum respectively. Please clarify.


when using ingestSegment firehose with initial data like

┌──────────────────────────┬──────────┬───────┬────────┐
│ __time                   │ animal   │ count │ number │
├──────────────────────────┼──────────┼───────┼────────┤
│ 2018-01-01T00:00:00.000Z │ aardvark │     1 │   9999 │
│ 2018-01-01T00:00:00.000Z │ bear     │     1 │    111 │
│ 2018-01-01T00:00:00.000Z │ lion     │     2 │    200 │
└──────────────────────────┴──────────┴───────┴────────┘

while adding a new data {"timestamp":"2018-01-01T03:01:35Z","animal":"giraffe", "number":30} with appendToExisting = true, i am getting

┌──────────────────────────┬──────────┬───────┬────────┐
│ __time                   │ animal   │ count │ number │
├──────────────────────────┼──────────┼───────┼────────┤
│ 2018-01-01T00:00:00.000Z │ aardvark │     1 │   9999 │
│ 2018-01-01T00:00:00.000Z │ bear     │     1 │    111 │
│ 2018-01-01T00:00:00.000Z │ lion     │     2 │    200 │
│ 2018-01-01T00:00:00.000Z │ aardvark │     1 │   9999 │
│ 2018-01-01T00:00:00.000Z │ bear     │     1 │    111 │
│ 2018-01-01T00:00:00.000Z │ giraffe  │     1 │     30 │
│ 2018-01-01T00:00:00.000Z │ lion     │     1 │    200 │
└──────────────────────────┴──────────┴───────┴────────┘

is it correct and expected output? why the rollup didn't happen?

Upvotes: 1

Views: 1494

Answers (1)

58k723f1
58k723f1

Reputation: 619

Druid has actually only 2 modes. Overwrite or append.

With the appendToExisting=true, your data will be appended to the existing data, which will cause that the "number" field will increase (and the count also).

With appendToExisting=false all your data in the segment is overwritten. I think this is what happening.

This is different then with "normal" databases, where you can update specific rows.

In druid you can update only certain rows, but this is done by re-indexing your data. It is not a very easy process. This re-indexing is done by an ingestSegment Firehose, which reads your data from a segment, and then writes it also to a segment (can be the same). During this process, you can add a transform filter, which does a specific action, like update certain field values.

We have build a PHP library to make these processes more easy to work with. See this example how to re-index a segment and apply a transformation during the re-indexing.

https://github.com/level23/druid-client#reindex

Upvotes: 4

Related Questions