lohitaksh yogi
lohitaksh yogi

Reputation: 111

Elastic Search and AWS python

I am working on AWS ElasticSearch using python,I have JSON file with 3 field.

("cat1","Cat2","cat3"), each row is separated with \n 
example  cat1:food, cat2: wine, cat3: lunch etc.

from requests_aws4auth import AWS4Auth
import boto3
import requests
    payload = {

  "settings": {
    "number_of_shards": 10,
    "number_of_replicas": 5
  },
  "mappings": { 
      "Categoryall" :{
        "properties" : {
          "cat1" : {
            "type": "string"
        },
          "Cat2":{
            "type" : "string"
        },
          "cat3" : {
            "type" : "string"
        }

      }    
    }
  } 
}

r = requests.put(url, auth=awsauth, json=payload)

I created schema/mapping for the index as shown above but i don't know how to populate index. I am thinking to put a for loop for JSON file and call post request to insert the index. Doesn't have an idea how to proceed.

I want to create index and bulk upload this file in the index. Any suggestion would be appreciated.

Upvotes: 1

Views: 2175

Answers (1)

dk-na
dk-na

Reputation: 141

Take a look at Elasticsearch Bulk API.

Basically, you need to create a bulk request body and post it to your "https://{elastic-endpoint}/_bulk" url.

The following example is showing a bulk request to insert 3 json records into your index called "my_index":

{ "index" : { "_index" : "my_index", "_type" : "_doc", "_id" : "1" } }
{ "cat1" : "food 1", "cat2": "wine 1", "cat3": "lunch 1" }
{ "index" : { "_index" : "my_index", "_type" : "_doc", "_id" : "2" } }
{ "cat1" : "food 2", "cat2": "wine 2", "cat3": "lunch 2" }
{ "index" : { "_index" : "my_index", "_type" : "_doc", "_id" : "3" } }
{ "cat1" : "food 3", "cat2": "wine 3", "cat3": "lunch 3" }

where each json record is represented by 2 json objects.

So if you write your bulk request body into a file called post-data.txt, then you can post it using Python something like this:

with open('post-data.txt','rb') as payload:
    r = requests.post('https://your-elastic-endpoint/_bulk', auth=awsauth,
                      data=payload, ... add more params)

Alternatively, you can try Python elasticsearch bulk helpers.

Upvotes: 1

Related Questions