How to Bulk index in Elastic Search using the Python API

Question

I am trying to bulk insert a lot of documents into elastic search using the Python API.

import elasticsearch
from pymongo import MongoClient

es = elasticsearch.Elasticsearch()

def index_collection(db, collection, fields, host='localhost', port=27017):
    conn = MongoClient(host, port)
    coll = conn[db][collection]
    cursor = coll.find({}, fields=fields, timeout=False)
    print "Starting Bulk index of {} documents".format(cursor.count())

    def action_gen():
        """
        Generator to use for bulk inserts
        """
        for n, doc in enumerate(cursor):

            op_dict = {
                '_index': db.lower(),
                '_type': collection,
                '_id': int('0x' + str(doc['_id']), 16),
            }
            doc.pop('_id')
            op_dict['_source'] = doc
            yield op_dict

    res = bulk(es, action_gen(), stats_only=True)
    print res

The documents come from a Mongodb collection and I amusing the function above to do the bulk indexing according to the way explained in the docs.

the bulk indexing goes on filling elastic search with thousands of empty documents. Can anyone tell me what am I doing wrong?

How to Bulk index in Elastic Search using the Python API

Answers (1)

Related Questions