user1187968
user1187968

Reputation: 7986

Bulk index document from JSON file into ElasticSearch

I have a sample.json as the following:

{"id":921,"car_make":"Chevrolet","car_model":"Traverse","car_year":2009,"car_color":"Yellow","made_in":"Guinea-Bissau"},
{"id":922,"car_make":"Mitsubishi","car_model":"Eclipse","car_year":1996,"car_color":"Khaki","made_in":"Luxembourg"},
{"id":923,"car_make":"Ford","car_model":"Lightning","car_year":1994,"car_color":"Teal","made_in":"China"},
{"id":924,"car_make":"Mercedes-Benz","car_model":"Sprinter 2500","car_year":2012,"car_color":"Yellow","made_in":"Colombia"},
{"id":925,"car_make":"Nissan","car_model":"Maxima","car_year":2002,"car_color":"Yellow","made_in":"Kazakhstan"},
{"id":926,"car_make":"Chrysler","car_model":"Pacifica","car_year":2006,"car_color":"Crimson","made_in":"China"}

What command should I use to index each row into ElasticSearch? So far I have tried the following and it's not working.

>> curl -XGET 'localhost:9200/car/car' -d @sample.json 
{"error":"Content-Type header [application/x-www-form-urlencoded] is not supported","status":406}

Also tried:

curl -XGET 'localhost:9200/car/inventory/_bulk' -H 'Content-Type: application/json' -d @sample.json 
{"_index":"car","_type":"inventory","_id":"_bulk","found":false}

Upvotes: 1

Views: 4753

Answers (1)

derickson82
derickson82

Reputation: 496

You will want to use the Bulk API.

The documentation does a good job of explaining everything, but watch out for the following things:

  • Your file should be Newline delimited json (NDJSON), with application/x-ndjson specified as the Content-Type. This means no commas at the end.
  • There will be 2 lines per record, an "Action/Metadata" line, and then the source json line
  • Your file MUST end with a newline character
  • When using curl, make sure to use --data-binary so the newline characters are preserved
  • The URL path does not need to specify the index or type, just _bulk, but then you must include the index and type in the metadata line for each record. If you specify the index and type in the url, then the metadata does not need to include _index and _type fields.

Taking your example, your file would look something like this:

{ "index" : { "_index" : "car", "_type" : "car", "_id" : "921" } }
{"id":921,"car_make":"Chevrolet","car_model":"Traverse","car_year":2009,"car_color":"Yellow","made_in":"Guinea-Bissau"}
{ "index" : { "_index" : "car", "_type" : "car", "_id" : "922" } }
{"id":922,"car_make":"Mitsubishi","car_model":"Eclipse","car_year":1996,"car_color":"Khaki","made_in":"Luxembourg"}
{ "index" : { "_index" : "car", "_type" : "car", "_id" : "923" } }
{"id":923,"car_make":"Ford","car_model":"Lightning","car_year":1994,"car_color":"Teal","made_in":"China"}
{ "index" : { "_index" : "car", "_type" : "car", "_id" : "924" } }
{"id":924,"car_make":"Mercedes-Benz","car_model":"Sprinter 2500","car_year":2012,"car_color":"Yellow","made_in":"Colombia"}
{ "index" : { "_index" : "car", "_type" : "car", "_id" : "925" } }
{"id":925,"car_make":"Nissan","car_model":"Maxima","car_year":2002,"car_color":"Yellow","made_in":"Kazakhstan"}
{ "index" : { "_index" : "car", "_type" : "car", "_id" : "926" } }
{"id":926,"car_make":"Chrysler","car_model":"Pacifica","car_year":2006,"car_color":"Crimson","made_in":"China"}

Then, of course, the curl command would specify the Content-Type header as application/x-ndjson, and look something like this:

curl -XPOST -H "Content-Type: application/x-ndjson" localhost:9200/_bulk --data-binary @sample.json 

Upvotes: 1

Related Questions