Reputation: 91
I need to get the name and value and context ref for all the fields under the tag ix:nonfraction
which looks like this:
<ix:nonfraction name="uk-gaap:TangibleFixedAssets" contextref="FY1.END" unitref="GBP" xmlns:uk-gaap="http://www.xbrl.org/uk/gaap/core/2009-09-01" decimals="0" format="ixt:numcommadot">238,011</ix:nonfraction>
.
with the output needed as :
TangibleFixedAssets, FY1.end, 238,011
the string that the regex will have to search through contains many of these tags so would there be a way of keeping all the 3 outputs concatenated (or within the same index of the list)?
Upvotes: 1
Views: 279
Reputation: 12168
import bs4
html = '''<ix:nonfraction name="uk-gaap:TangibleFixedAssets" contextref="FY1.END" unitref="GBP" xmlns:uk-gaap="http://www.xbrl.org/uk/gaap/core/2009-09-01" decimals="0" format="ixt:numcommadot">238,011</ix:nonfraction>'''
soup = bs4.BeautifulSoup(html, 'lxml')
ixs = soup.find_all('ix:nonfraction')
for ix in ixs:
name = ix['name'].split(':')[-1]
contextref = ix['contextref']
text = ix.text
output = [name, contextref, text]
print(output)
out:
['TangibleFixedAssets', 'FY1.END', '238,011']
Upvotes: 1