Reputation: 281
Sample data:
{
"_id": "OzE5vaa3p7",
"categories": [
{
"__type": "Pointer",
"className": "Category",
"objectId": "nebCwWd2Fr"
}
],
"isActive": true,
"imageUrl": "https://firebasestorage.googleapis.com/v0/b/shopgro-1376.appspot.com/o/Barcode%20Data%20Upload%28II%29%2FAnil_puttu_flour_500g.png?alt=media&token=9cf63197-0925-4360-a31a-4675f4f46ae2",
"barcode": "8908001921015",
"isFmcg": true,
"itemName": "Anil puttu flour 500g",
"mrp": 58,
"_created_at": "2016-10-02T13:49:03.281Z",
"_updated_at": "2017-02-22T08:48:09.548Z"
}
{
"_id": "ENPCL8ph1p",
"categories": [
{
"__type": "Pointer",
"className": "Category",
"objectId": "B4nZeUHmVK"
}
],
"isActive": true,
"imageUrl": "https://firebasestorage.googleapis.com/v0/b/kirananearby-9eaa8.appspot.com/o/Barcode%20data%20upload%2FYippee_Magic_Masala_Noodles,_70_g.png?alt=media&token=d9e47bd7-f847-4d6f-9460-4be8dbcaae00",
"barcode": "8901725181222",
"isFmcg": true,
"itemName": "Yippee Magic Masala Noodles, 70 G",
"mrp": 12,
"_created_at": "2016-10-02T13:49:03.284Z",
"_updated_at": "2017-02-22T08:48:09.074Z"
}
I tried:
import pandas as pd
data= pd.read_json('Data.json')
getting error ValueError: Expected object or value
also
import json
with open('gdb.json') as datafile:
data = json.load(datafile)
retail = pd.DataFrame(data)
error: json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 509)
with open('gdb.json') as datafile:
for line in datafile:
data = json.loads(line)
retail = pd.DataFrame(data)
error: json.decoder.JSONDecodeError: Extra data: line 1 column 577 (char 576)
How to read this json into pandas
Upvotes: 28
Views: 153201
Reputation: 13
You can't have two separate dictionaries in one json file. You need a containing dictionary or list.
In the .json file put the code below:
[
{
"_id": "OzE5vaa3p7",
"categories": [
{
"__type": "Pointer",
"className": "Category",
"objectId": "nebCwWd2Fr"
}
],
"isActive": true,
"imageUrl": "https://firebasestorage.googleapis.com/v0/b/shopgro-1376.appspot.com/o/Barcode%20Data%20Upload%28II%29%2FAnil_puttu_flour_500g.png?alt=media&token=9cf63197-0925-4360-a31a-4675f4f46ae2",
"barcode": "8908001921015",
"isFmcg": true,
"itemName": "Anil puttu flour 500g",
"mrp": 58,
"_created_at": "2016-10-02T13:49:03.281Z",
"_updated_at": "2017-02-22T08:48:09.548Z"
},
{
"_id": "ENPCL8ph1p",
"categories": [
{
"__type": "Pointer",
"className": "Category",
"objectId": "B4nZeUHmVK"
}
],
"isActive": true,
"imageUrl": "https://firebasestorage.googleapis.com/v0/b/kirananearby-9eaa8.appspot.com/o/Barcode%20data%20upload%2FYippee_Magic_Masala_Noodles,_70_g.png?alt=media&token=d9e47bd7-f847-4d6f-9460-4be8dbcaae00",
"barcode": "8901725181222",
"isFmcg": true,
"itemName": "Yippee Magic Masala Noodles, 70 G",
"mrp": 12,
"_created_at": "2016-10-02T13:49:03.284Z",
"_updated_at": "2017-02-22T08:48:09.074Z"
}
]
Now you can access the data like this:
import pandas as pd
data = pd.read_json('Data.json')
# To get the first dictionary
dict1 = data[0]
# To get the second dictionary
dict2 = data[1]
Thank you!
Upvotes: 0
Reputation: 1243
I got the same error, read the function documentation and play around with different parameters.
I solved it by using the one below,
data= pd.read_json('Data.json', lines=True)
you can try out other things like
data= pd.read_json('Data.json', lines=True, orient='records')
data= pd.read_json('Data.json', orient=str)
Upvotes: 18
Reputation: 520
See many times the JSON is in the following format (for those who are still searching for the solution):
{col1:'val1', col2:'val2'}{col1:'val1', col2:'val2'}{col1:'val1', col2:'val2'}
🖖🏻 As you can see we have three issues here:
0.
Add the square brackets if not already
Add them [
and ]
in the beginning of JSON and at the end. Which is just the matter of pressing Home
and End
keys on your keyboard 😊
1.
Replace single quotes with double
import re
# either this (simple)
p = re.compile('(?<!\\\\)\'')
# or this - takes care of quotes in the values
p = re.compile("(?<=:)\s*'(.*?)'\s*(?=,|\n|})")
data = p.sub('\"', data)
Assuming the JSON data is in the string format and stored in the
data
variable.
2.
Provide the double quotes to the keys
data = re.sub(r'(\w+)(?=:)', r'"\1"', data)
3.
Give the new line for each record
data = re.sub(r'}\s*{', '},\n{', _data)
with open("ABC.json", "w") as file:
file.write(data)
df = pd.read_json(r"./ABC.json")
We are done. We have the clean JSON like this:
[
{"col1":"val1", "col2":"val2"},
{"col1":"val1", "col2":"val2"},
{"col1":"val1", "col2":"val2"}
]
Upvotes: 0
Reputation: 1
Seems like there's a million things that can cause this. In my case, it was that my json file started had a byte order mark, denoted by [BOM] [unix]
in the vim-airline. I don't know what the byte order mark is or when it would be needed. To remove that, in vim, I ran :set nobomb
and then saved the file. Then, pandas could read it and I was good to go.
Upvotes: 0
Reputation: 41
I just solved this problem by adding a "/" at the beggining of the absolute path.
import pandas as pd
pd_from_json = pd.read_json("/home/miguel/folder/information.json")
Upvotes: 0
Reputation: 11
The problem of ValueError: All arrays must be of the same length
that happens with
df = pd.read_json (r'./filename.json')#,lines=True)
can be solved by changing the line above to the following.
df = pd.read_json (r'./filename.json',lines=True)
Upvotes: 0
Reputation: 11
If you type in the absolute path of and use \ it should work. At least thats how I fixed the issue
Upvotes: 1
Reputation: 19
this worked for me: pd.read_json('./dataset/healthtemp.json', typ="series")
Upvotes: 0
Reputation: 1779
Another variation, combining tips from the thread that all failed independently but this worked for me:
pd.read_json('file.json', lines=True, encoding = 'utf-8-sig')
Upvotes: 1
Reputation: 151
I faced the same problem the reason behind this is the json file has something that doesn't abide by json rules. In my case i had used single quotes in one of the values instead of double quotes.
Upvotes: 4
Reputation: 1724
I encountered this error message today, and in my case the problem was that the encoding of the text file was UTF-8-BOM instead of UTF-8, which is the default for read_json(). This can be solved by specifying the encoding:
data= pd.read_json('Data.json', encoding = 'utf-8-sig')
Upvotes: 10
Reputation: 11
If you try the code below, it will solve the problem:
data_set = pd.read_json(r'json_file_address\file_name.json', lines=True)
Upvotes: 1
Reputation: 17
You can try to change relative path to absolute path For your situation change
import pandas as pd
data= pd.read_json('Data.json')
to
import pandas as pd
data= pd.read_json('C://Data.json')#the absolute path in explore
I got the same error when I run the same code from jupyter notebook to pycharm's jupyter notebook in console
Upvotes: 1
Reputation: 11
make your path easy, it will be helpful to read data. meanwhile, just put your file on your desktop and give that path to read the data. It works.
Upvotes: 1
Reputation: 2133
Your JSON is malformed.
ValueError: Expected object or value
can occur if you mistyped the file name. Does Data.json
exist? I noticed for your other attempts you used gdb.json
.
Once you confirm the file name is correct, you have to fix your JSON. What you have now is two disconnected records separated by a space. Lists in JSON must be valid arrays inside square brackets and separated by a comma: [{record1}, {record2}, ...]
Also, for pandas you should put your array under a root element called "data"
:
{ "data": [ {record1}, {record2}, ... ] }
Your JSON should end up looking like this:
{"data":
[{
"_id": "OzE5vaa3p7",
"categories": [
{
"__type": "Pointer",
"className": "Category",
"objectId": "nebCwWd2Fr"
}
],
"isActive": true,
"imageUrl": "https://firebasestorage.googleapis.com/v0/b/shopgro-1376.appspot.com/o/Barcode%20Data%20Upload%28II%29%2FAnil_puttu_flour_500g.png?alt=media&token=9cf63197-0925-4360-a31a-4675f4f46ae2",
"barcode": "8908001921015",
"isFmcg": true,
"itemName": "Anil puttu flour 500g",
"mrp": 58,
"_created_at": "2016-10-02T13:49:03.281Z",
"_updated_at": "2017-02-22T08:48:09.548Z"
}
,
{
"_id": "ENPCL8ph1p",
"categories": [
{
"__type": "Pointer",
"className": "Category",
"objectId": "B4nZeUHmVK"
}
],
"isActive": true,
"imageUrl": "https://firebasestorage.googleapis.com/v0/b/kirananearby-9eaa8.appspot.com/o/Barcode%20data%20upload%2FYippee_Magic_Masala_Noodles,_70_g.png?alt=media&token=d9e47bd7-f847-4d6f-9460-4be8dbcaae00",
"barcode": "8901725181222",
"isFmcg": true,
"itemName": "Yippee Magic Masala Noodles, 70 G",
"mrp": 12,
"_created_at": "2016-10-02T13:49:03.284Z",
"_updated_at": "2017-02-22T08:48:09.074Z"
}]}
Finally, pandas calls this format split orientation
, so you have to load it as follows:
df = pd.read_json('gdb.json', orient='split')
df
now contains the following data frame:
_id categories isActive imageUrl barcode isFmcg itemName mrp _created_at _updated_at
0 OzE5vaa3p7 [{'__type': 'Pointer', 'className': 'Category', 'objectI... True https://firebasestorage.googleapis.com/v0/b/shopgro-1376... 8908001921015 True Anil puttu flour 500g 58 2016-10-02 13:49:03.281000+00:00 2017-02-22 08:48:09.548000+00:00
1 ENPCL8ph1p [{'__type': 'Pointer', 'className': 'Category', 'objectI... True https://firebasestorage.googleapis.com/v0/b/kirananearby... 8901725181222 True Yippee Magic Masala Noodles, 70 G 12 2016-10-02 13:49:03.284000+00:00 2017-02-22 08:48:09.074000+00:00
Upvotes: 21
Reputation: 2035
I am not sure if I clearly understood your question, you just trying to read json data ?
I just collected your sample data into list as shown below
[
{
"_id": "OzE5vaa3p7",
"categories": [
{
"__type": "Pointer",
"className": "Category",
"objectId": "nebCwWd2Fr"
}
],
"isActive": true,
"imageUrl": "https://firebasestorage.googleapis.com/v0/b/shopgro-1376.appspot.com/o/Barcode%20Data%20Upload%28II%29%2FAnil_puttu_flour_500g.png?alt=media&token=9cf63197-0925-4360-a31a-4675f4f46ae2",
"barcode": "8908001921015",
"isFmcg": true,
"itemName": "Anil puttu flour 500g",
"mrp": 58,
"_created_at": "2016-10-02T13:49:03.281Z",
"_updated_at": "2017-02-22T08:48:09.548Z"
},
{
"_id": "ENPCL8ph1p",
"categories": [
{
"__type": "Pointer",
"className": "Category",
"objectId": "B4nZeUHmVK"
}
],
"isActive": true,
"imageUrl": "https://firebasestorage.googleapis.com/v0/b/kirananearby-9eaa8.appspot.com/o/Barcode%20data%20upload%2FYippee_Magic_Masala_Noodles,_70_g.png?alt=media&token=d9e47bd7-f847-4d6f-9460-4be8dbcaae00",
"barcode": "8901725181222",
"isFmcg": true,
"itemName": "Yippee Magic Masala Noodles, 70 G",
"mrp": 12,
"_created_at": "2016-10-02T13:49:03.284Z",
"_updated_at": "2017-02-22T08:48:09.074Z"
}
]
and ran this code
import pandas as pd
df = pd.read_json('Data.json')
print(df)
Output:-
_created_at ... mrp
0 2016-10-02 13:49:03.281 ... 58
1 2016-10-02 13:49:03.284 ... 12
[2 rows x 10 columns]
Upvotes: 1
Reputation: 382
you should be ensure that the terminal directory is the same with the file directory (when this error occurs for me, because I used vscode, is means for me that the terminal directory in the vscode is not the same with my python file that I want to execute)
Upvotes: 6
Reputation: 39
I dont think this would be the problem as it should be the default (I think). But have you tried this? Adding an 'r' to specify the file is read only.
import json
with open('gdb.json', 'r') as datafile:
data = json.load(datafile)
retail = pd.DataFrame(data)
Upvotes: 2