Luke
Luke

Reputation: 623

Python JSON encoding error

I have a Python script to read the contents of a JSON file and import to a MongoDB.

I am getting the following error from it:

Traceback (most recent call last):
  File "/home/luke/projects/vuln_backend/vuln_backend/mongodb.py", line 39, in process_files
    file_content = currentFile.read()
  File "/home/luke/envs/vuln_backend/lib64/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position 14: invalid continuation byte

This is the code:

import json
import logging
import logging.handlers
import os
import glob
from logging.config import fileConfig
from zipfile import ZipFile
from pymongo import MongoClient


def process_files():
    try:
        client = MongoClient('5.57.62.97', 27017)
        db = client['vuln_sets']
        coll = db['vulnerabilities']
        basepath = os.path.dirname(__file__)
        filepath = os.path.abspath(os.path.join(basepath, ".."))
        archive_filepath = filepath + '/vuln_files/'
        archive_files = glob.glob(archive_filepath + "/*.zip")

        for file in archive_files:
            with open(file, "r") as currentFile:
                file_content = currentFile.read()
                vuln_content = json.loads(file_content)
            for item in vuln_content:
                coll.insert(item)
    except Exception as e:
        logging.exception(e)

I have tried setting the encoding to UTF8 and Windows-1252 but these do not seem to be able to read the JSON either.

How can I get it to determine which encoding is used in the JSON?

Upvotes: 1

Views: 2069

Answers (1)

cs95
cs95

Reputation: 402493

Notice that you are trying to call json.load on a zipped file. You'll have to unzip it first, that you do using the zipfile module, like this:

with open ZipFile(file, 'r') as f:
    f.extractall(dest)

Where file is the loop variable.

Furthermore, when reading a JSON file, I'd recommend using json.load(fileobj) (1 step) over reading your file contents and calling json.loads(string_from_file) in the string (2 steps).

Upvotes: 1

Related Questions