Reputation: 3301
https://www.elastic.co/guide/en/elasticsearch/client/php-api/current/_indexing_documents.html
Based on Elasticsearch API document
To bulk dump data to elasticsearch
for($i = 0; $i < 100; $i++) {
$params['body'][] = [
'index' => [
'_index' => 'my_index',
'_type' => 'my_type',
]
];
$params['body'][] = [
'my_field' => 'my_value',
'second_field' => 'some more values'
];
}
Basically, you loop through each document, add the same meta data for each document and then call the bulk function to bulk dump these data.
I have data save in Google Cloud Storage as JSON (New line delimited) format. There are hundreds of thousands or millions same format documents in the file (same index/type meta data for elasticsearch).
To bulk dump this Google Cloud Storage file to Elasticsearch, I have to read in this file and loop through each document in this file, assign the same meta data for each document and then finally bulk dump to Elasticsearch.
It would be nice that I can just give one meta data (basically for which index and which type these documents should be indexed) instead of looping through the file and add the same meta data for each document, and give the whole file (Json documents new line delimited), then bulk dump will do the rest of the works.
Knowing that Elasticsearch bulk API not offering this feature yet.
But I assume that bulk dump json file saved in s3 or google cloud storage to elasticsearch is common demand.
So someone else might already run into this use case and solve the issue.
Any advice and suggestions from your experience?
Thanks!
Upvotes: 0
Views: 1845
Reputation: 161
Do you have to do it from php? If not, than I think elasticdump should do the trick. It can load data from json (and seems like from s3 as well) and insert it into ES. If your data sits on GCP you just need to stream-load data from storage, and pipe it to elasticdump
Upvotes: 1