Pmmoks
Pmmoks

Reputation: 55

Extract information from financial statements from XBRL files in Python?

I am trying to parse and get information from XBRL files, and it seems there are a number of open source packages that have the ability to parse XBRL files in python. However, documentation on using them seems to be lacking. The packages I have investigated are:

*For reasons that I won't get into, I'm currently unable to use Brel (it's essentially due to requiring python >=3.10).

However, the other packages don't seem to allow to me to parse downloaded XBRL files offline. The structure of my dowloaded XBRL files is as follows (taking the example of Apple from the SEC filings):

aapl-20200926
 |
 +-- aapl-20200926.xsd
 +-- aapl-20200926_cal.xml
 +-- aapl-20200926_def.xml
 +-- aapl-20200926_lab.xml
 +-- aapl-20200926_pre.xml
 +-- aapl-20200926_htm.xml

I'm aware this isn't the typical purpose of stackoverflow, but does anyone know how I could start parsing XBRL files in the above structure (rather than from a link on the internet) in a python script? I expect there is some way to do this using Arelle and the Python API, or in py-xbrl, but I haven't been able to crack it so far.

Upvotes: 2

Views: 3136

Answers (1)

pdw
pdw

Reputation: 993

The easiest way to get started with Arelle is to download the complete ZIP of the filing from the SEC. Annoyingly, it's not directly linked from the page you linked to, but you find it by opening the iXBRL file, and going to Menu -> Save XBRL Zip file. Or you can just replace -index.htm with -xbrl.zip:

https://www.sec.gov/Archives/edgar/data/320193/000032019320000096/0000320193-20-000096-xbrl.zip

To get started, try this command line:

python3 $HOME/Arelle/arelleCmdLine.py -v \
  -f 0000320193-20-000096-xbrl.zip  \
  --plugins 'validate/EFM|saveLoadableOIM' \
  --saveLoadableOIM=aapl.json

This should convert your downloaded file into xBRL-JSON format, saved as aapl.json.

The validate/EFM plugin is needed for SEC filings, as they use some custom transforms.

The saveLoadableOIM enables the xBRL-JSON functionality.

To get started with Python, here's a fairly minimal script that dumps out all facts in the report:

import sys
from arelle import Cntlr, PluginManager
from arelle.ModelFormulaObject import FormulaOptions
import arelle.FileSource

class ArelleLoader(Cntlr.Cntlr):

    def __init__(self):
        super().__init__(hasGui=False)
        PluginManager.addPluginModule("validate/EFM")
        PluginManager.reset()
        self.modelManager.loadCustomTransforms()
        self.modelManager.formulaOptions = FormulaOptions()
        self.startLogging(logFileName="logToStdErr",
                           logFormat="[%(messageCode)s] %(message)s - %(file)s",
                           logLevel="DEBUG",
                           logRefObjectProperties=True,
                           logToBuffer=False
                           )

    def loadReport(self, f):
        fs = arelle.FileSource.openFileSource(f, self)
        model = self.modelManager.load(fs)
        for fact in model.facts:
            print("%s: %s" % (fact.concept.name, fact.value))


xbrl = ArelleLoader()
xbrl.loadReport(sys.argv[1])

To process this, you will need to unzip the filing, and then feed it the .htm file:

python3 arelleSample.py aapl/aapl-20200926.htm 

The ModelFact object has quite a few properties and methods. You might find this code from the Inline XBRL viewer plugin useful to see what's possible.

Upvotes: 3

Related Questions