Reputation: 937
I have a query:
s = Search(using=client, index='myindex', doc_type='mytype')
s.query = Q('bool', must=[Q('match', BusinessUnit=bunit),
Q('range', **dicdate)])
res = s.execute()
return me 627033 lines, I want to convert this dictionary in a dataframe with 627033 lines
Upvotes: 2
Views: 5188
Reputation: 1
I found the solution by Phil B a good template for my situation. However, all results are returned as lists, rather than atomic data types. To get around this, I added the following helper function and code:
def flat_data(val):
if isinstance(val):
return val[0]
else:
return val
df = pd.DataFrame.from_dict([{k:flat_data(v) for (k,v) in document(['fields'].items()}
for document in results])
Upvotes: 0
Reputation: 6037
If your request is likely to return more than 10,000 documents from Elasticsearch, you will need to use the scrolling function of Elasticsearch. Documentation and examples for this function are rather difficult to find, so I will provide you with a full, working example:
import pandas as pd
from elasticsearch import Elasticsearch
import elasticsearch.helpers
es = Elasticsearch('127.0.0.1',
http_auth=('my_username', 'my_password'),
port=9200)
body={"query": {"match_all": {}}}
results = elasticsearch.helpers.scan(es, query=body, index="my_index")
df = pd.DataFrame.from_dict([document['_source'] for document in results])
Simply edit the fields that start with "my_" to correspond to your own values
Upvotes: 3
Reputation: 95
Based on your comment I think what you're looking for is size:
es.search(index="my-index", doc_type="mydocs", body="your search", size="1000")
I'm not sure if this will work for 627,033 lines -- you might need scroll for that.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
Upvotes: 2