silpa
silpa

Reputation: 57

Parsing SEC tabular data

My requirement is to parse SEC tabular data. Please find the sample tabular data in the below image. enter image description here I'm using Python for it. I found that the tabular data is being stored in XBRL format. In the beginning, I tried to parse the XBRL data as the way we parse XML using the lxml module. Later I realized that it's a complex model to parse and we have many libraries for parsing XBRL document. I've gone through different libraries like python-xbrl, xbrl, and, installed servers(raptorXMLXBRL server) for parsing XBRL documents. But none worked as expected. As I mentioned earlier, my goal is to get the tabular data from the SEC. WE can find sample documents in this link. Can you please suggest me a process/module for parsing the tabular data. Thanks in advance.

Upvotes: 0

Views: 1198

Answers (1)

Jack Fleeting
Jack Fleeting

Reputation: 24940

Like you, I tried parsing xbrl documents using whatever tools are available in python - without much success. So one way to work around the problem is to get to the html filing underlying the xbrl filing.

So, to use your example link, the url of the first 10K there is

https://www.sec.gov/ix?doc=/Archives/edgar/data/1551152/000155115220000007/abbv-20191231x10k.htm

Simply strip the /ix?doc= string from the url, and you are left with

https://www.sec.gov/Archives/edgar/data/1551152/000155115220000007/abbv-20191231x10k.htm

which is the same 10k filing, but in html format. From there you can just use your normal html tools to extract whatever data you are interested in.

Upvotes: 3

Related Questions