Náthali
Náthali

Reputation: 937

ElasticSearch query to pandas dataframe

I have a query:

s = Search(using=client, index='myindex', doc_type='mytype')
s.query = Q('bool', must=[Q('match', BusinessUnit=bunit),
                          Q('range', **dicdate)])

res = s.execute()

return me 627033 lines, I want to convert this dictionary in a dataframe with 627033 lines

Upvotes: 2

Views: 5188

Answers (3)

DSJ529
DSJ529

Reputation: 1

I found the solution by Phil B a good template for my situation. However, all results are returned as lists, rather than atomic data types. To get around this, I added the following helper function and code:

def flat_data(val):
  if isinstance(val):
    return val[0]
  else:
    return val
df = pd.DataFrame.from_dict([{k:flat_data(v) for (k,v) in document(['fields'].items()} 
                            for document in results])

Upvotes: 0

Phil B
Phil B

Reputation: 6037

If your request is likely to return more than 10,000 documents from Elasticsearch, you will need to use the scrolling function of Elasticsearch. Documentation and examples for this function are rather difficult to find, so I will provide you with a full, working example:

import pandas as pd
from elasticsearch import Elasticsearch
import elasticsearch.helpers


es = Elasticsearch('127.0.0.1',
        http_auth=('my_username', 'my_password'),
        port=9200)

body={"query": {"match_all": {}}}
results = elasticsearch.helpers.scan(es, query=body, index="my_index")
df = pd.DataFrame.from_dict([document['_source'] for document in results])

Simply edit the fields that start with "my_" to correspond to your own values

Upvotes: 3

a mark
a mark

Reputation: 95

Based on your comment I think what you're looking for is size:

es.search(index="my-index", doc_type="mydocs", body="your search", size="1000")

I'm not sure if this will work for 627,033 lines -- you might need scroll for that.

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html

Upvotes: 2

Related Questions