Reputation: 623
I have a Python script to read the contents of a JSON file and import to a MongoDB.
I am getting the following error from it:
Traceback (most recent call last):
File "/home/luke/projects/vuln_backend/vuln_backend/mongodb.py", line 39, in process_files
file_content = currentFile.read()
File "/home/luke/envs/vuln_backend/lib64/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position 14: invalid continuation byte
This is the code:
import json
import logging
import logging.handlers
import os
import glob
from logging.config import fileConfig
from zipfile import ZipFile
from pymongo import MongoClient
def process_files():
try:
client = MongoClient('5.57.62.97', 27017)
db = client['vuln_sets']
coll = db['vulnerabilities']
basepath = os.path.dirname(__file__)
filepath = os.path.abspath(os.path.join(basepath, ".."))
archive_filepath = filepath + '/vuln_files/'
archive_files = glob.glob(archive_filepath + "/*.zip")
for file in archive_files:
with open(file, "r") as currentFile:
file_content = currentFile.read()
vuln_content = json.loads(file_content)
for item in vuln_content:
coll.insert(item)
except Exception as e:
logging.exception(e)
I have tried setting the encoding to UTF8 and Windows-1252 but these do not seem to be able to read the JSON either.
How can I get it to determine which encoding is used in the JSON?
Upvotes: 1
Views: 2069
Reputation: 402493
Notice that you are trying to call json.load
on a zipped file. You'll have to unzip it first, that you do using the zipfile
module, like this:
with open ZipFile(file, 'r') as f:
f.extractall(dest)
Where file
is the loop variable.
Furthermore, when reading a JSON file, I'd recommend using json.load(fileobj)
(1 step) over reading your file contents and calling json.loads(string_from_file)
in the string (2 steps).
Upvotes: 1