Reputation: 489
As a practice project, I am trying to scrape property data from a website. (I only intend to practice my web scraping skills with no intention to further take advantage of the data scraped). But I found that some properties don't have price available, therefore, this creates an error of different length when I am trying to combine them into one data frame.
Here is the code for scraping:
library(tidyverse)
library(revest)
web_page <- read_html("https://wx.fang.anjuke.com/loupan/all/a1_p2/")
community_name <- web_page %>%
html_nodes(".items-name") %>%
html_text()
length(community_name)
listed_price <- web_page %>%
html_nodes(".price") %>%
html_text()
length(listed_price)
property_data <- data.frame(
name=community_name,
price=listed_price
)
How can I identity the property with no listed price and fill the price variable with NA when there is no value scraped?
Upvotes: 0
Views: 220
Reputation: 33782
Inspection of the web page shows that the class is .price
when price has a value, and .price-txt
when it does not. So one solution is to use an XPath expression in html_nodes()
and match classes that start with "price":
listed_price <- web_page %>%
html_nodes(xpath = "//p[starts-with(@class, 'price')]") %>%
html_text()
length(listed_price)
[1] 60
Upvotes: 1