Reputation: 309
The tag structure is like below.
<div class="a">
<div class="b"></div>
<div class="c"></div>
Text...
</div>
I would like to get class a div excluding class b and c divs.
How could I do this?
Upvotes: 0
Views: 130
Reputation: 1135
Parsing the xml with etree and using an xpath expression to select the text nodes you want may be the best solution here, combined with some Python string manipulation as needed. Demonstrating in iPython:
In [1]: from lxml import etree
In [2]: str = '''<div class="a">
...: <div class="b">Unwanted</div>
...: <div class="c">Unwanted</div>
...: Text...
...: </div>'''
In [3]: root = etree.fromstring(str)
In [4]: root.xpath("//div[@class='a']/text()")
Out[4]: ['\n ', '\n ', '\n Text...\n']
In [5]: ''.join(root.xpath("//div[@class='a']/text()")).strip()
Out[5]: 'Text...'
Upvotes: 1
Reputation: 483
You can use soup.find for this:
filtered_divs = soup.find_all("div", {"class": "a"})
Upvotes: 0