newBie
newBie

Reputation: 55

How to add only new docs or changed docs on elasticsearch?

Scenario: Script pulls data from an external API, formats the results as a dictionary/json object, and pushes the data to elasticsearch. The script is scheduled to run periodically.

Conditions: The script should only push the dictionaries for records that do not already exist in elasticsearch. And for records that exist in elasticsearch, update fields if any data has been changed.

My Approach: The records from the API have an ID which I use to check if they exist in elasticsearch by doing a search query. I make a list of IDs that do not exist in elasticsearch and push the corresponding records to elasticsearch.

Issue: For example, if record with {'ID':1, 'Status':'Started'} was pushed to elasticsearch yesterday. Now the data has changed to {'ID':1, 'Status':'Completed'} it will still be ignored because I am checking only the ID.

Solution that I am thinking of: Insert into elasticsearch by comparing all the fields of the json object/dictionary. If everything matches, skip insertion. If any field has different value insert into elasticsearch [Redundancy of having multiple docs for the same record is not an issue. Redundancy of having multiple docs for the same record with all the same values needs to be avoided.]

Upvotes: 0

Views: 642

Answers (1)

pacuna
pacuna

Reputation: 2099

You can pass the document ID to the index method. This will insert the record if it doesn't exist or it will update any fields that are different. This way you don't need to add custom logic to manage that ID as a regular field.

Upvotes: 1

Related Questions