Jacksonsox
Jacksonsox

Reputation: 1233

R WebScraping Getting Extra Text when using Rvest

I'm trying to get sold dates from eBay using R and RVest web scraping

The url is url

literally

https://www.ebay.com/sch/Star%20Wars%20%20BARC%20Speeder%20Bike%20Trooper%20Buzz%20-2009%20-Red%20-Obi-wan%20-Kenobi%20-Jesse%20-halmark%20-Funko%20-Pop%20-Black%20-snaptite%20-model%20-30th%20-Saga%20-Lego%20-McDonalds%20-McDonald%27s%20-Topps%20-Heroes%20-Playskool%20-Transformers%20-Titanium%20-Die-Cast%20-2003%20-2004%20-2005%20-2006%20-2007%20-2008%20-2012%20-2013%20%28Clone%20Wars%29&LH_Sold=1&LH_ItemCondition=3&_dmd=7&_ipg=200&LH_Complete=1&LH_PrefLoc=1

The full xpath to the first item sold date is: //*[@id="srp-river-results"]/ul/li[1]/div/div[2]/div[2]/div/span/span[1]

If I use that and then html_text() to this path, I get nothing. character(0)

When I remove the spans, and add the POSITIVE node, I get the date, but also a bunch of extra text.

R code:

readHTML <- url %>%
            read_html()

    SoldDate <- readHTML %>%
        html_nodes(xpath='//*[@id="srp-river-results"]/ul/li[1]/div/div[2]/div[2]/div') %>%
        html_nodes("[class='POSITIVE']") %>%
        html_text(trim = TRUE)

Result:

"SoYlPd N Feb 316,Z RM9USI2021"

I should get:

"Feb 16, 2021"

View from Safari

Upvotes: 0

Views: 54

Answers (1)

Jacksonsox
Jacksonsox

Reputation: 1233

There are 2 great answers with more detail specifics on the issue here: Rvest Split Data by Class Name where the class names change

Upvotes: 0

Related Questions