Julie
Julie

Reputation: 77

JSONDecodeError: Expecting value: line 1 column 1 (char 0) when scaping SEC EDGAR

My codes are as follows:

import requests
import urllib
from bs4 import BeautifulSoup

year_url = r"https://www.sec.gov/Archives/edgar/daily-index/2020/index.json"
year_content = requests.get(year_url)
decoded_year_url = year_content.json()

I could run the exactly same codes last year, but when I ran it yesterday, the warning popped up: "JSONDecodeError: Expecting value: line 1 column 1 (char 0)" Why? How should I solve the problem? Thanks a lot!

Upvotes: 3

Views: 665

Answers (2)

visdev
visdev

Reputation: 64

Try importing json module and use the method json.loads()

import requests
import urllib
from bs4 import BeautifulSoup
import json

year_url = r"https://www.sec.gov/Archives/edgar/daily-index/2020/index.json"
year_content = requests.get(year_url)
decoded_year_url = json.loads(year_content)

Upvotes: -1

BrokenBenchmark
BrokenBenchmark

Reputation: 19223

Apparently the SEC has added rate-limiting to their website, according to this GitHub issue from May 2021. The reason why you're receiving the error message is that the response contains HTML, rather than JSON, which causes requests to raise an error upon calling .json().

To resolve this, you need to add the User-agent header to your request. I can access the JSON with the following:

import requests
import urllib
from bs4 import BeautifulSoup

year_url = r"https://www.sec.gov/Archives/edgar/daily-index/2020/index.json"
year_content = requests.get(year_url, headers={'User-agent': '[specify user agent here]'})
decoded_year_url = year_content.json()

Upvotes: 0

Related Questions