Krzysztof
Krzysztof

Reputation: 2082

Elastic search - best way for multiple updates index?

I'm integrating with external system.

From it I have 3 files:

Order in each of them can be random.

There is:

Goal:

Merge files together and put it in one index in Elastic search.

Additional info:

-each file has circa 1 million records

-this operation will be done each night

-data is used only for search purposes

Options:

a) I thought about:

  1. Parse and add to ES first file

  2. Do the same from next and update document created in point one

Looks very inefficient.

b) another way:

  1. parse and add first file to relational data base

  2. do same with another fields and update records from point one

  3. Propagate data to ES

Can you see another options?

Upvotes: 0

Views: 399

Answers (1)

Chules
Chules

Reputation: 426

I assume you have a normalized relational data structure with 1 to n relationships in those CSV files like that:

customer_data.csv

Id;Name;AdressId;AdditionalCustomerDataId;...
0;Mike;2;1;...

address_data.csv

Id;Street;City;...
....
2;Abbey Road;London;...

additional_customer_data.csv

Id;someData;...
...
1;data;...

In that case, I would denormalize those in a preprocessing step into one single CSV and use that to upload them to ES. For avoiding downtime, you can then use aliases. Preprocessing can be done in any language, but probably converting the CSVs into a sqlite table will be the fastest.

I wouldn't choose a strategy to create just half of the document and add the additional information later, as you probably need to reindex afterwards.

However, maybe you can tell us more about the requirements and the external system, cause this doesn't seem to be a great solution.

Upvotes: 1

Related Questions