vineet_ssrx
vineet_ssrx

Reputation: 43

How to convert json to pd.dataframe in Python

I have a json file that i need convert to Pandas DataFrame.

json:

{'@odata.context': 'http://analysis.windows.net/v1.0/myorg/groups//$metadata#Microsoft.PowerBI.ServiceContracts.Api.V1.RefreshSchedule', 'days': ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'], 'times': ['00:30'], 'enabled': False, 'localTimeZoneId': 'UTC', 'notifyOption': 'MailOnFailure'}

I have tried below codes one by one but they all returned: ValueError: arrays must all be same length

1)

test_df = pd.DataFrame(
                    {'@odata.context': 'http://analysis.windows.net/v1.0/myorg/groups//$metadata#Microsoft.PowerBI.ServiceContracts.Api.V1.RefreshSchedule', 
                     'days': ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'], 
                     'times': ['00:30'], 
                     'enabled': False, 
                     'localTimeZoneId': 'UTC', 
                     'notifyOption': 'MailOnFailure'})

2) Also tried working with "" as someone suggested in similar stackoverflow question

test_df = pd.DataFrame(
                    {"@odata.context": "http://analysis.windows.net/v1.0/myorg/groups//$metadata#Microsoft.PowerBI.ServiceContracts.Api.V1.RefreshSchedule", 
                     "days": ["Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"], 
                     "times": ["00:30"], 
                     "enabled": False, 
                     "localTimeZoneId": "UTC", 
                     "notifyOption": "MailOnFailure"})

3)

test_df = pd.DataFrame(
                    {"@odata.context": "http://analysis.windows.net/v1.0/myorg/groups//$metadata#Microsoft.PowerBI.ServiceContracts.Api.V1.RefreshSchedule", 
                     "days": ["Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"], 
                     "times": ["00:30"], 
                     "enabled": "False", 
                     "localTimeZoneId": "UTC", 
                     "notifyOption": "MailOnFailure"})

Upvotes: 1

Views: 380

Answers (1)

Trenton McKinney
Trenton McKinney

Reputation: 62503

It all depends on the file:

  • The first issue is getting the data from the file, which depends on the format of the file
  • The sample data used, is five repeating rows of the single quoted data at the top of the question.

If the file is a bunch of dicts, each on a newline:

{'@odata.context': 'http://analysis.windows.net/v1.0/myorg/groups//$metadata#Microsoft.PowerBI.ServiceContracts.Api.V1.RefreshSchedule', 'days': ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'], 'times': ['00:30'], 'enabled': False, 'localTimeZoneId': 'UTC', 'notifyOption': 'MailOnFailure'}
{...}
{...}
{...}
{...}

Code to create dataframe:

  • encoding="utf8" can be removed if it's not required
  • pandas.io.json.json_normalize is used to normalize semi-structured JSON data into a flat table.
    • This means it will flatten the nested lists.
import pandas as pd
from pandas.io.json import json_normalize
from ast import literal_eval


line_list = list()
with open("test.json", encoding="utf8") as f:
    for line in f:
        line = literal_eval(line)
        line_list.append(line)

df = json_normalize(line_list, ['days'], ['@odata.context', 'enabled', 'localTimeZoneId', 'notifyOption', 'times'],)

If the file is a list of dicts:

[{...},
 {...},
 {...},
 {...},
 {...}]

Code to create dataframe:

with open("test.json", encoding="utf8") as f:
    data = literal_eval(f.read())

df = json_normalize(data, ['days'], ['@odata.context', 'enabled', 'localTimeZoneId', 'notifyOption', 'times'])

dataframe output:

         0                                                                                                      @odata.context enabled localTimeZoneId   notifyOption  times
    Sunday  http://analysis.windows.net/v1.0/myorg/groups//$metadata#Microsoft.PowerBI.ServiceContracts.Api.V1.RefreshSchedule   False             UTC  MailOnFailure  00:30
    Monday  http://analysis.windows.net/v1.0/myorg/groups//$metadata#Microsoft.PowerBI.ServiceContracts.Api.V1.RefreshSchedule   False             UTC  MailOnFailure  00:30
   Tuesday  http://analysis.windows.net/v1.0/myorg/groups//$metadata#Microsoft.PowerBI.ServiceContracts.Api.V1.RefreshSchedule   False             UTC  MailOnFailure  00:30
 Wednesday  http://analysis.windows.net/v1.0/myorg/groups//$metadata#Microsoft.PowerBI.ServiceContracts.Api.V1.RefreshSchedule   False             UTC  MailOnFailure  00:30
  Thursday  http://analysis.windows.net/v1.0/myorg/groups//$metadata#Microsoft.PowerBI.ServiceContracts.Api.V1.RefreshSchedule   False             UTC  MailOnFailure  00:30
    Friday  http://analysis.windows.net/v1.0/myorg/groups//$metadata#Microsoft.PowerBI.ServiceContracts.Api.V1.RefreshSchedule   False             UTC  MailOnFailure  00:30
  Saturday  http://analysis.windows.net/v1.0/myorg/groups//$metadata#Microsoft.PowerBI.ServiceContracts.Api.V1.RefreshSchedule   False             UTC  MailOnFailure  00:30
    Sunday  http://analysis.windows.net/v1.0/myorg/groups//$metadata#Microsoft.PowerBI.ServiceContracts.Api.V1.RefreshSchedule   False             UTC  MailOnFailure  00:30
    Monday  http://analysis.windows.net/v1.0/myorg/groups//$metadata#Microsoft.PowerBI.ServiceContracts.Api.V1.RefreshSchedule   False             UTC  MailOnFailure  00:30
   Tuesday  http://analysis.windows.net/v1.0/myorg/groups//$metadata#Microsoft.PowerBI.ServiceContracts.Api.V1.RefreshSchedule   False             UTC  MailOnFailure  00:30
 Wednesday  http://analysis.windows.net/v1.0/myorg/groups//$metadata#Microsoft.PowerBI.ServiceContracts.Api.V1.RefreshSchedule   False             UTC  MailOnFailure  00:30
  Thursday  http://analysis.windows.net/v1.0/myorg/groups//$metadata#Microsoft.PowerBI.ServiceContracts.Api.V1.RefreshSchedule   False             UTC  MailOnFailure  00:30
    Friday  http://analysis.windows.net/v1.0/myorg/groups//$metadata#Microsoft.PowerBI.ServiceContracts.Api.V1.RefreshSchedule   False             UTC  MailOnFailure  00:30
  Saturday  http://analysis.windows.net/v1.0/myorg/groups//$metadata#Microsoft.PowerBI.ServiceContracts.Api.V1.RefreshSchedule   False             UTC  MailOnFailure  00:30

Upvotes: 1

Related Questions