Elastic search - best way for multiple updates index?

Question

I'm integrating with external system.

From it I have 3 files:

customer_data.csv
address_data.csv
additional_customer_data.csv

Order in each of them can be random.

There is:

relation one to many (customer_data => addresses) but I am interested only in one address with specified kind.
one to one (customer_data => additional_customer_data)

Goal:

Merge files together and put it in one index in Elastic search.

Additional info:

-each file has circa 1 million records

-this operation will be done each night

-data is used only for search purposes

Options:

a) I thought about:

Parse and add to ES first file
Do the same from next and update document created in point one

Looks very inefficient.

b) another way:

parse and add first file to relational data base
do same with another fields and update records from point one
Propagate data to ES

Can you see another options?

Chules · Accepted Answer

I assume you have a normalized relational data structure with 1 to n relationships in those CSV files like that:

customer_data.csv

Id;Name;AdressId;AdditionalCustomerDataId;...
0;Mike;2;1;...

address_data.csv

Id;Street;City;...
....
2;Abbey Road;London;...

additional_customer_data.csv

Id;someData;...
...
1;data;...

In that case, I would denormalize those in a preprocessing step into one single CSV and use that to upload them to ES. For avoiding downtime, you can then use aliases. Preprocessing can be done in any language, but probably converting the CSVs into a sqlite table will be the fastest.

I wouldn't choose a strategy to create just half of the document and add the additional information later, as you probably need to reindex afterwards.

However, maybe you can tell us more about the requirements and the external system, cause this doesn't seem to be a great solution.

Elastic search - best way for multiple updates index?

Answers (1)

Related Questions