curtisp
curtisp

Reputation: 2315

BeautifulSoup xml get class name value

I am using BeautifulSoup to parse Tableau twb XML files to get list of worksheets in the report.

The XML that holds the value I am looking for is

<window class='worksheet' name='ML Productivity'>

Struggling with how to get all of the class='worksheet' and then get the name value from those eg I want to get the 'ML Productivity' value.

Code I have so far is below.

import sys, os
import bs4 as bs

twbpath = "C:/tbw tbwx files/"

outpath = "C:/out/"

outFile = open(outpath + 'output.txt', "w")
#twbList = open(outpath + 'twb.txt', "w")

for subdir, dirs, files in os.walk(twbpath):
    for file in files:
        if file.endswith('.twb'):
            print(subdir.replace(twbpath,'') + '-' + file)
            filepath = open(subdir + '/' + file, encoding='utf-8').read()           
            soup = bs.BeautifulSoup(filepath, 'xml')
            classnodes = soup.findAll('window')

            for classnode in classnodes:
                if str(classnode) == 'worksheet':
                    outFile.writelines(file + ',' +  str(classnode) + '\n')
                    print(subdir.replace(twbpath,'') + '-' + file, classnode)   

outFile.close()

Upvotes: 1

Views: 1389

Answers (1)

alecxe
alecxe

Reputation: 473853

You can filter the desired window element by the class attribute value and then treat the result like a dictionary to get the desired attribute:

soup.find('window', {'class': 'worksheet'})['name']

If there are multiple window elements you need to locate, use find_all():

for window in soup.find_all('window', {'class': 'worksheet'}):
    print(window['name'])

Upvotes: 1

Related Questions