Rothelhurt
Rothelhurt

Reputation: 23

How to convert Pandas DataFrame with JSON-strings into valid JSON?

Im trying to save lage excel file with multiple shets into JSON-file with pandas. I need in result get structure like this:

{ 'Sheet1':
   [ 'column1': value,
     'column2': value,
     'column3': value,
     'column4': value ]
'Sheet2':
   [ 'column1': value,
     'column2': value,
     'column3': value,
     'column4': {'json_key1': value,
                 'json_key2': value,}
   ]
 }

Im tried this code to get this:

import pandas as pd
import json

EXCEL_FILE = 'example_data.xlsm'
JSON_FILE = 'json_data.json'

sheets = pd.ExcelFile(EXCEL_FILE).sheet_names
json_data = {}
for sheet in sheets:
    df = pd.read_excel(EXCEL_FILE, index_col=None, header=0, sheet_name=sheet, na_values='null')
    json_data[sheet] = json.loads(df.to_json(orient='records', force_ascii=False, date_format='iso'))
with open(JSON_FILE, 'w',  encoding='utf-8') as json_file:
    json.dump(json_data, json_file, indent=2, ensure_ascii=False)

There are several columns in excel with json-like strings. [1]: https://i.sstatic.net/gvc0K.png

When I export into JSON with df.to_json() it saves this columns like this:

{
  "acts_31L": [
    {
      "ID": 219100060,
      "ID_ETD": null,
      "INDEX_NUM": "31-7635-191022195410",
      "IT_SECTIONS": "{\"CTIME\":\"2019-10-22 21:26:41.680\",\"section\":{\"CTIME\":\"2019-10-22 21:26:41.680\",\"SERIE\":\"506\",\"SERIE_NAME\":\"ТЭП70\",\"SER_NUM\":\"00000542\",\"SEC_CODE\":\"0\",\"EL_COUNT\":0,\"FUEL_LIT\":0.0,\"FUEL_DENS\":0.8,\"FUEL_KG\":0.0,\"IS_NEED\":\"1\"}}",
      "IT_INVENT": "{\"CTIME\":\"2019-10-22 21:26:41.680\",\"inv\":{\"CTIME\":\"2019-10-22 21:26:41.680\",\"INVENT_NAME\":\"Пенька\",\"UNIT\":\"шт.\",\"NORMA\":0,\"FACT\":0,\"INFO_TYPE\":\"0\"}}"
    },

So how to save this strings as json-object?

Upvotes: 2

Views: 445

Answers (1)

deadshot
deadshot

Reputation: 9071

Before converting dataframe to json convert the columns IT_SECTIONS and IT_INVENT to dict using ast.literal_eval. Then you can convert it to json.

from ast import literal_eval

for sheet in sheets:
    df = pd.read_excel(EXCEL_FILE, index_col=None, header=0, sheet_name=sheet, na_values='null')
    df['IT_SECTIONS'] = df['IT_SECTIONS'].apply(lambda x: literal_eval(str(x)))
    df['IT_INVENT'] = df['IT_INVENT'].apply(lambda x: literal_eval(str(x)))
    json_data[sheet] = json.loads(df.to_json(orient='records', force_ascii=False, date_format='iso'))

Upvotes: 2

Related Questions