How to get HTML element that is before a certain class?

Question

I'm scraping and having trouble getting the element of the “th” tag that comes before the other “th” element that contains the “type2” class. I prefer to take it by identifying that it is the element "th" before the "th" with class "type2" because my HTML has a lot of "th" and that was the only difference I found between the tables.

Using rvest or xml2 (or other R package), can I get this parent? The content which I want is "text_that_I_want".

Thank you!


    text_that_I_want
    
        
            
                
                    name
                    answers

Allan Cameron · Accepted Answer

The formal and more generalizable way to navigate xpath relative to a given node is via ancestor preceding-sibling:

read_html(htmldoc) %>% 
html_nodes(xpath = "//th[@class = 'string type2']/ancestor::td/preceding-sibling::th") %>% 
html_text()
#> [1] "text_that_I_want"

How to get HTML element that is before a certain class?

Answers (2)

Related Questions