Reputation: 99
I have a large json file that I need to read into a pandas dataframe without using the json module. Here is a link to the file melbourne_bike_share.json. I didn't know what to cut out to make a minimal example.
Every way I try to get from reading the file to a dataframe, I get the same error:
ValueError: Mixing dicts with non-Series may lead to ambiguous ordering.
I have tried reading it straight in:
import pandas as pd
mbs = pd.read_json('Melbourne_bike_share.json')
and making sure my string to be read by pd.read_json() is correct:
with open("Melbourne_bike_share.json", encoding="utf8") as mbs_json:
mbs_string = mbs_json.readlines()
mbs_string = [line.rstrip() for line in mbs_string]
mbs_string = ''.join(mbs_string)
mbs = pd.read_json('Melbourne_bike_share.json')
But still get the same ValueError. I cannot find what is causing this error, what it really means, or pretty much any asked and answered questions involving reading json files that don't just suggest using the json module, which I cannot do.
I am new to python and json files. From what I gather, the next step after being able to read the json file is to flatten it:
from pandas.io.json import json_normalize
df = json_normalize(mbs)
after which, I will have my dataframe and can manipulate that.
Post Edited to show the expected first line of the resultant dataframe. First line is the column headers (default indexes until I figure out how to pull the column headers from the metadata in the json file). I spaced the column headers 0-5 to align with their values, after that, the format here puts them on a new row. sorry about that. hope you can see that they should all align. Second line is the first row.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
0 155 7C09387D-9E6C-4B42-9041-9A98B88F54BB 155 1428899388 880594 1453945520 880594 {\n "invalidCells" : {\n "27624917" : "22/... 2 Harbour Town - Docklands Dve - Docklands 60000 9 14 1453985105 [{"address":"","city":"","state":"","zip":""},..
Any help would be appreciated.
Upvotes: 0
Views: 2250
Reputation: 14063
I think you are looking for something like this:
import pandas as pd
df = pd.read_json('Melbourne_bike_share.json', typ='series')
pd.DataFrame(data=df['data'])
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
0 155 7C09387D-9E6C-4B42-9041-9A98B88F54BB 155 1428899388 880594 1453945520 880594 {\n "invalidCells" : {\n "27624917" : "22/... 2 Harbour Town - Docklands Dve - Docklands 60000 9 14 1453985105 [{"address":"","city":"","state":"","zip":""},...
1 156 52739A59-E034-436B-A613-E7A5F62448C0 156 1428899388 880594 1453945520 880594 {\n "invalidCells" : {\n "27624917" : "22/... 4 Federation Square - Flinders St / Swanston St ... 60001 11 11 1453985105 [{"address":"","city":"","state":"","zip":""},...
2 157 7EFB5219-2764-47CE-A497-B6D872823BBE 157 1428899388 880594 1453945520 880594 {\n "invalidCells" : {\n "27624917" : "22/... 5 Plum Garland Reserve - Beaconsfield Pde - Albe... 60002 16 1 1453985105 [{"address":"","city":"","state":"","zip":""},...
3 158 4B09D743-FFEE-4185-B968-A7866E45FE0B 158 1428899388 880594 1453945520 880594 {\n "invalidCells" : {\n "27624917" : "22/... 6 State Library - Swanston St / Little Lonsdale ... 60003 9 2 1453985105 [{"address":"","city":"","state":"","zip":""},...
4 159 C5493850-5714-40E2-9A67-DA443A4F64F8 159 1428899388 880594 1453945520 880594 {\n "invalidCells" : {\n "27624917" : "22/... 7 Bourke Street Mall - 205 Bourke St - City 60004 10 1 1453985105 [{"address":"","city":"","state":"","zip":""},...
if you want to add the column names then do something like this:
import pandas as pd
mbs = pd.read_json('Melbourne_bike_share.json', typ='series')
mbs_columns = mbs['meta']['view']['columns']
names = [item['name'] for item in mbs_columns]
df = pd.DataFrame(data=mbs['data'],columns=names)
Upvotes: 1