Hexie
Hexie

Reputation: 4221

Elasticsearch Bulk JSON Data

This question arises from this SO thread.

As it seems I have a similar but not the same query, it might be best to have a separate question for others to benefit from, as @Val suggested.

So, similar to the above, I have the need to insert a massive amount of data into an index (my initial testing is about 10 000 documents but this is just for a POC, there are many more). The data I would like to insert is in a .json document and looks something like this (snippet):

[ { "fileName": "filename", "data":"massive string text data here" }, 
  { "fileName": "filename2", "data":"massive string text data here" } ]

On my own admission I am new to ElasticSearch, however, from reading through the documentation, my assumptions were that I could take a .json file and create an index from the data within. I have now since learnt that it seems each item within the json needs to have a "header", something like:

{"index":{}}
{ "fileName": "filename", "data":"massive string text data here" }

Meaning, that this is not actual json format (as such) but rather manipulated string.

I would like to know if there is a way to import my json data as is (in json format), without having to manually manipulate the text first (as my test data has 10 000 entries, I'm sure you can see why I'd prefer not doing this manually).

Any suggestions or suggested automated tools to help with this?

PS - I am using the windows installer and Postman for the calls.

Upvotes: 11

Views: 10002

Answers (2)

DamarOwen
DamarOwen

Reputation: 177

this is my code to bulk data to es


const es = require("elasticsearch");
const client = new es.Client({
  hosts: ["http://localhost:9200"],
});

const cities = <path to your json file>;

let bulk: any = [];

cities.forEach((city: any) => {
  bulk.push({
    index: {
      _index: <index name>,
      _type: <type name>,
    },
  });

  bulk.push(city);
});


//loop through each city and create and push two objects into the array in each loop
//first object sends the index and type you will be saving the data as
//second object is the data you want to index

client.bulk({ body: bulk }, function (err: any, response: any) {
  if (err) {
    console.log("Failed Bulk operation", err);
  } else {
    console.log("Successfully imported %s", cities.length);
  }
});

or you can use library like elasticdump or elasticsearch-tools

Upvotes: 1

Val
Val

Reputation: 217544

You can transform your file very easily with a single shell command like this. Provided that your file is called input.json, you can do this:

jq -c -r ".[]" input.json | while read line; do echo '{"index":{}}'; echo $line; done > bulk.json

After this you'll have a file called bulk.json which is properly formatted to be sent to the bulk endpoint.

Then you can call your bulk endpoint like this:

curl -XPOST localhost:9200/your_index/your_type/_bulk -H "Content-Type: application/x-ndjson" --data-binary @bulk.json

Note: You need to install jq first if you don't have it already.

Upvotes: 19

Related Questions