AutomaticStatic
AutomaticStatic

Reputation: 1729

Parsing html with <br> tags (Python)

I'm using lxml to parse some html. HTML looks like:

<td valign="top">first text field<br>second text field</td>

And no, the break tag isn't closed anywhere down the line.

element.text returns the first of the two, and element.xpath('string()') returns both with no \n or other separator.

I figure I can just parse twice and "subtract" the former from the latter if I want only the second text field, but that's not ideal. I assume there must be some other way of getting that second text field but I'm stumped.

Upvotes: 3

Views: 1181

Answers (1)

AutomaticStatic
AutomaticStatic

Reputation: 1729

Answered myself. element.xpath('text()') returns an array of both things I'm looking for.

Upvotes: 2

Related Questions