Reputation: 77
My codes are as follows:
import requests
import urllib
from bs4 import BeautifulSoup
year_url = r"https://www.sec.gov/Archives/edgar/daily-index/2020/index.json"
year_content = requests.get(year_url)
decoded_year_url = year_content.json()
I could run the exactly same codes last year, but when I ran it yesterday, the warning popped up: "JSONDecodeError: Expecting value: line 1 column 1 (char 0)" Why? How should I solve the problem? Thanks a lot!
Upvotes: 3
Views: 665
Reputation: 64
Try importing json module and use the method json.loads()
import requests
import urllib
from bs4 import BeautifulSoup
import json
year_url = r"https://www.sec.gov/Archives/edgar/daily-index/2020/index.json"
year_content = requests.get(year_url)
decoded_year_url = json.loads(year_content)
Upvotes: -1
Reputation: 19223
Apparently the SEC has added rate-limiting to their website, according to this GitHub issue from May 2021. The reason why you're receiving the error message is that the response contains HTML, rather than JSON, which causes requests
to raise an error upon calling .json()
.
To resolve this, you need to add the User-agent
header to your request. I can access the JSON with the following:
import requests
import urllib
from bs4 import BeautifulSoup
year_url = r"https://www.sec.gov/Archives/edgar/daily-index/2020/index.json"
year_content = requests.get(year_url, headers={'User-agent': '[specify user agent here]'})
decoded_year_url = year_content.json()
Upvotes: 0