dethtron5000
dethtron5000

Reputation: 10841

Beautiful Soup 4 CSS sibling selector

I'm trying to parse some HTML exported from an InDesign document with Beatiful Soup 4 abd Python 2.7. I am trying to find a specific tag by using a CSS sibling selector. I am able to access the tag I want by selecting its sibling via a CSS selector and then using the Beautiful Soup find_next_sibling() method, but I can't select it directly via a CSS selector.

I have verified that the selector itself is valid when I try it in pure CSS/JS (http://jsfiddle.net/Sj63x/1/). I have tried using all three parsers recommended on the Beautiful Soup home page as well.

Relevant code is posted below (text is in the JS fiddle):

text = BeautifulSoup(text)

'''this finds the sibling'''
sibling = text.select(".Book-Title-") 
print(sibling[0].string)

'''this finds the sibling I am looking for'''
targetText = sibling[0].find_next_sibling()
print(targetText.string)

'''This should find the same text but returns an empty list'''
targetText2 = text.select(".Book-Title- ~.Text")
print(targetText2)

'''Other attempted variations - also return empty lists'''
targetText3 = text.select(".Book-Title- ~ .Text")
targetText4 = text.select(".Book-Title- + .Text")

Upvotes: 1

Views: 1475

Answers (1)

HAL
HAL

Reputation: 2071

Try using this selector instead:

targetText2 = text.select(".Book-Title- + .Text")

or add a space between the tilde character and the sibling:

targetText2 = text.select(".Book-Title- ~ .Text")

Upvotes: 4

Related Questions