Reputation: 55
I am trying to parse and get information from XBRL files, and it seems there are a number of open source packages that have the ability to parse XBRL files in python. However, documentation on using them seems to be lacking. The packages I have investigated are:
*For reasons that I won't get into, I'm currently unable to use Brel (it's essentially due to requiring python >=3.10).
However, the other packages don't seem to allow to me to parse downloaded XBRL files offline. The structure of my dowloaded XBRL files is as follows (taking the example of Apple from the SEC filings):
aapl-20200926
|
+-- aapl-20200926.xsd
+-- aapl-20200926_cal.xml
+-- aapl-20200926_def.xml
+-- aapl-20200926_lab.xml
+-- aapl-20200926_pre.xml
+-- aapl-20200926_htm.xml
I'm aware this isn't the typical purpose of stackoverflow, but does anyone know how I could start parsing XBRL files in the above structure (rather than from a link on the internet) in a python script? I expect there is some way to do this using Arelle and the Python API, or in py-xbrl, but I haven't been able to crack it so far.
Upvotes: 2
Views: 3136
Reputation: 993
The easiest way to get started with Arelle is to download the complete ZIP of the filing from the SEC. Annoyingly, it's not directly linked from the page you linked to, but you find it by opening the iXBRL file, and going to Menu -> Save XBRL Zip file. Or you can just replace -index.htm
with -xbrl.zip
:
https://www.sec.gov/Archives/edgar/data/320193/000032019320000096/0000320193-20-000096-xbrl.zip
To get started, try this command line:
python3 $HOME/Arelle/arelleCmdLine.py -v \
-f 0000320193-20-000096-xbrl.zip \
--plugins 'validate/EFM|saveLoadableOIM' \
--saveLoadableOIM=aapl.json
This should convert your downloaded file into xBRL-JSON format, saved as aapl.json
.
The validate/EFM
plugin is needed for SEC filings, as they use some custom transforms.
The saveLoadableOIM
enables the xBRL-JSON functionality.
To get started with Python, here's a fairly minimal script that dumps out all facts in the report:
import sys
from arelle import Cntlr, PluginManager
from arelle.ModelFormulaObject import FormulaOptions
import arelle.FileSource
class ArelleLoader(Cntlr.Cntlr):
def __init__(self):
super().__init__(hasGui=False)
PluginManager.addPluginModule("validate/EFM")
PluginManager.reset()
self.modelManager.loadCustomTransforms()
self.modelManager.formulaOptions = FormulaOptions()
self.startLogging(logFileName="logToStdErr",
logFormat="[%(messageCode)s] %(message)s - %(file)s",
logLevel="DEBUG",
logRefObjectProperties=True,
logToBuffer=False
)
def loadReport(self, f):
fs = arelle.FileSource.openFileSource(f, self)
model = self.modelManager.load(fs)
for fact in model.facts:
print("%s: %s" % (fact.concept.name, fact.value))
xbrl = ArelleLoader()
xbrl.loadReport(sys.argv[1])
To process this, you will need to unzip the filing, and then feed it the .htm
file:
python3 arelleSample.py aapl/aapl-20200926.htm
The ModelFact
object has quite a few properties and methods. You might find this code from the Inline XBRL viewer plugin useful to see what's possible.
Upvotes: 3