JKR
JKR

Reputation: 109

How to get data from SEC Edgar python and a json

on the following page below there is as Data source a json link: https://www.sec.gov/edgar/browse/?CIK=1067983&owner=exclude Data source: CIK0001067983.json -> https://data.sec.gov/submissions/CIK0001067983.json

This is my code (it works fine!):

headers = {
"Host": "www.sec.gov",
"User-Agent": "jo boulement [email protected]",
"Accept-Encoding": "gzip, deflate" 
}

sec_url = "https://data.sec.gov/submissions/CIK0001067983.json"
resp = requests.get(sec_url, headers=headers)
with open("e:\\sec_api_of_1448574_7.html", "w", encoding="utf-8") as my_file:
my_file.write(resp.text)

but as result I get a file looks like this: enter image description here

Error 404: Page Not Found Oops! Page Not Found.

What is here going wrong? The json-link: https://data.sec.gov/submissions/CIK0001067983.json is fine, because download by hand from the page works fine. Hope somebody could give me a hint! Thx!

Upvotes: 5

Views: 3298

Answers (4)

Pandem1c
Pandem1c

Reputation: 878

if you dont mind a plug, i have built a service for parsing EDGAR filings into useful JSON. With an API key you can request any SEC filing and get it's JSON version.

Check out the service at https://www.edgar-json.com/ and hit me up if you want to try it out!

Upvotes: 0

JKR
JKR

Reputation: 109

thx, for your help ... I have the solution ...

the documentation of the sec.gov says the following:

enter image description here https://www.sec.gov/os/webmaster-faq#user-agent

but the header "HOST" lead to the "404 page not found" ...

but this header works fine:

headers = {
"User-Agent": "jo boulement [email protected]",
"Accept-Encoding": "gzip, deflate" 
}

crazy! because the documentation says something else :(

Upvotes: 6

D Malan
D Malan

Reputation: 11414

A web server checks the headers that you send in your request and might decide to return an error page if you don't include certain headers. In this case, it looks like they return an error if you don't include a valid user agent.

This works for me:

import requests

headers = {'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36'}

url = "https://data.sec.gov/submissions/CIK0001067983.json"

payload={}

response = requests.request("GET", url, headers=headers, data=payload)

print(response.text)

Upvotes: 2

CryptoFool
CryptoFool

Reputation: 23089

The HTML that gets returned includes this <script> tag:

<script src="/files/js/js_DkdESgtfPfV7guog-Lhz7nda0K-ISZe0-gHU4CF6Wo0.js"></script>

My guess is that the script referenced by the tag is what causes the JSON data to be returned. A browser will run that script as part of rendering the HTML. The Requests package doesn't do this. It just returns the raw HTML. You might need to use something like Puppeteer or Selenium to get the JSON via that URL.

Upvotes: -2

Related Questions