Reputation: 2315
I am using BeautifulSoup to parse Tableau twb XML files to get list of worksheets in the report.
The XML that holds the value I am looking for is
<window class='worksheet' name='ML Productivity'>
Struggling with how to get all of the class='worksheet' and then get the name value from those eg I want to get the 'ML Productivity' value.
Code I have so far is below.
import sys, os
import bs4 as bs
twbpath = "C:/tbw tbwx files/"
outpath = "C:/out/"
outFile = open(outpath + 'output.txt', "w")
#twbList = open(outpath + 'twb.txt', "w")
for subdir, dirs, files in os.walk(twbpath):
for file in files:
if file.endswith('.twb'):
print(subdir.replace(twbpath,'') + '-' + file)
filepath = open(subdir + '/' + file, encoding='utf-8').read()
soup = bs.BeautifulSoup(filepath, 'xml')
classnodes = soup.findAll('window')
for classnode in classnodes:
if str(classnode) == 'worksheet':
outFile.writelines(file + ',' + str(classnode) + '\n')
print(subdir.replace(twbpath,'') + '-' + file, classnode)
outFile.close()
Upvotes: 1
Views: 1389
Reputation: 473853
You can filter the desired window
element by the class
attribute value and then treat the result like a dictionary to get the desired attribute:
soup.find('window', {'class': 'worksheet'})['name']
If there are multiple window
elements you need to locate, use find_all()
:
for window in soup.find_all('window', {'class': 'worksheet'}):
print(window['name'])
Upvotes: 1