shantanuo
shantanuo

Reputation: 32316

pandas to_json output in double quotes, to feed to Elasticsearch

The to_json method of Pandas DataFrame returns the data correctly. But I am not able to process it in the next step. For e.g.

try:
    from StringIO import StringIO
except ImportError:
    from io import StringIO

myst="""
20-01-17    pizza   90
21-01-17    pizza   120
22-01-17    pizza   239
23-01-17    pizza   200
20-01-17    fried-rice  100
21-01-17    fried-rice  120
22-01-17    fried-rice  110
23-01-17    fried-rice  190
20-01-17    ice-cream   8
21-01-17    ice-cream   23
22-01-17    ice-cream   21
23-01-17    ice-cream   100
"""
u_cols=['date', 'product', 'sales']

myf = StringIO(myst)
import pandas as pd
df = pd.read_csv(StringIO(myst), sep='\s+', names = u_cols)

The next step is to export the data to JSON to be imported in Elasticsearch.

tmp=df.to_json(orient="records")
import json
json.loads(tmp)

This returns the following (invalid JSON) output:

[{'date': '20-01-17', 'product': 'pizza', 'sales': 90},
 {'date': '21-01-17', 'product': 'pizza', 'sales': 120},
 {'date': '22-01-17', 'product': 'pizza', 'sales': 239},
 {'date': '23-01-17', 'product': 'pizza', 'sales': 200},
 {'date': '20-01-17', 'product': 'fried-rice', 'sales': 100},
 {'date': '21-01-17', 'product': 'fried-rice', 'sales': 120},
 {'date': '22-01-17', 'product': 'fried-rice', 'sales': 110},
 {'date': '23-01-17', 'product': 'fried-rice', 'sales': 190},
 {'date': '20-01-17', 'product': 'ice-cream', 'sales': 8},
 {'date': '21-01-17', 'product': 'ice-cream', 'sales': 23},
 {'date': '22-01-17', 'product': 'ice-cream', 'sales': 21},
 {'date': '23-01-17', 'product': 'ice-cream', 'sales': 100}]

It seems that Elastic does not like single quotes. How do I get the same output as above in double quotes?

Upvotes: 1

Views: 998

Answers (1)

Davide Fiocco
Davide Fiocco

Reputation: 5914

Not sure it helps, but adding after your code something along the lines of

from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk

es = Elasticsearch()

actions = [
     {
     '_index' : 'transactions',
     '_type' : 'content',
     '_date' : rec['date'],
     '_product' : rec['product'],
     '_sales' : rec['sales'],
     }
for rec in json.loads(tmp)
]

bulk(es, actions)

should allow the creation of an index.

Upvotes: 1

Related Questions