I'm trying to import a large JSON document in to Elasticsearch 5.1. A small section of the data looks like this: [ { "id": 1, "region": "ca-central-1", "eventName": "CreateRole", "eventTime": "2016-02-04T03:41:19.000Z", "userName": "email@group.com" }, { "id": 2, "region": "ca-central-1", "eventName": "AddRoleToInstanceProfile", "eventTime": "2016-02-04T03:41:19.000Z", "userName": "email@group.com" }, { "id": 3, "region": "ca-central-1", "eventName": "CreateInstanceProfile", "eventTime": "2016-02-04T03:41:19.000Z", "userName": "email@group.com" }, { "id": 4, "region": "ca-central-1", "eventName": "AttachGroupPolicy", "eventTime": "2016-02-04T01:42:36.000Z", "userName": "email@group.com" }, { "id": 5, "region": "ca-central-1", "eventName": "AttachGroupPolicy", "eventTime": "2016-02-04T01:39:20.000Z", "userName": "email@group.com" } ] I'd like to import the data without making any changes to the source data if possible, so I believe that rules out the _bulk command as I'd need to add additional details for each entry. I've tried several different methods but have not had any luck. Am I wasting my time trying to import this document as-is? I've tried: curl -XPOST 'demo.ap-southeast-2.es.amazonaws.com/rea/test' --data-binary @Records.json But that fails with an error: {"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"failed to parse"}],"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"not_x_content_exception","reason":"Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes"}},"status":400} Thanks!

Reputation: 497

Importing JSON in to Elasticsearch 5.1 using CURL

I'm trying to import a large JSON document in to Elasticsearch 5.1. A small section of the data looks like this:

[
    {
      "id": 1,
      "region": "ca-central-1",
      "eventName": "CreateRole",
      "eventTime": "2016-02-04T03:41:19.000Z",
      "userName": "[email protected]"
    },
    {
      "id": 2,
      "region": "ca-central-1",
      "eventName": "AddRoleToInstanceProfile",
      "eventTime": "2016-02-04T03:41:19.000Z",
      "userName": "[email protected]"
    },
    {
      "id": 3,
      "region": "ca-central-1",
      "eventName": "CreateInstanceProfile",
      "eventTime": "2016-02-04T03:41:19.000Z",
      "userName": "[email protected]"
    },
    {
      "id": 4,
      "region": "ca-central-1",
      "eventName": "AttachGroupPolicy",
      "eventTime": "2016-02-04T01:42:36.000Z",
      "userName": "[email protected]"
    },
    {
      "id": 5,
      "region": "ca-central-1",
      "eventName": "AttachGroupPolicy",
      "eventTime": "2016-02-04T01:39:20.000Z",
      "userName": "[email protected]"
    }
]

I'd like to import the data without making any changes to the source data if possible, so I believe that rules out the _bulk command as I'd need to add additional details for each entry.

I've tried several different methods but have not had any luck. Am I wasting my time trying to import this document as-is?

I've tried:

curl -XPOST 'demo.ap-southeast-2.es.amazonaws.com/rea/test' --data-binary @Records.json

But that fails with an error:

{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"failed to parse"}],"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"not_x_content_exception","reason":"Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes"}},"status":400}

Thanks!

Upvotes: 1

Answers (2)

pratikvasa

Reputation: 2045

If you don't want to modify the file the bulk api will not work.

You can have a look at jq. It is a command line json parser. It might help you generate the document required to run the bulk api.

cat Records.json | 
jq -c '
.[] |
{ index: { _index: "index_name", _type: "type_name" } },
. '

You can try something like this and pass it to the bulk api. Hope this helps.

You can also try making a curl call which would be something like this.

cat Records.json | 
jq -
.[] |
{ index: { _index: "index_name", _type: "type_name" } },
. ' | curl -XPOST demo.ap-southeast-2.es.amazonaws.com/_bulk --data-binary @-

Have not tried the second part but should work.

Upvotes: 1

Sperr

Reputation: 599

You might want to check out stream2es - it's a helpful utility for sending documents to ElasticSearch. I think it may do what you need to do.

Once you have it installed, you should be able to use it something like this:

cat Records.json | ./stream2es stdin --target 'http://demo.ap-southeast-2.es.amazonaws.com/rea/test'

Upvotes: 0

Importing JSON in to Elasticsearch 5.1 using CURL

Answers (2)

Related Questions