Reputation: 483
library(rvest);library(tidyverse)
urls <- str_c("https://news.ycombinator.com/news?p", seq(1,2,1))
title <- urls %>%
map(
gettitle <- function(df){
read_html(df) %>%
html_nodes("a.storylink") %>%
html_text() %>%
enframe(name = NULL)
}
) %>%
bind_rows()
It will be a dataframe with one column. I want to create a new column and paste the url which is belong to title for each line.
# A tibble: 6 x 2
value url
<chr> <chr>
1 1k True Fans? Try 100 https://news.ycombinator.com/news?p=1
2 FLIF – Free Lossless Image Format https://news.ycombinator.com/news?p=1
3 Critical Bluetooth Vulnerability in Android (CVE-2020-0022) https://news.ycombinator.com/news?p=1
4 The Rapid Growth of Io_uring https://news.ycombinator.com/news?p=1
5 Show HN: Building an open-source language-learning platform https://news.ycombinator.com/news?p=1
6 TV Backlight Compensation https://news.ycombinator.com/news?p=1
Upvotes: 2
Views: 64
Reputation: 23574
Here is one way for you. When you loop through each page, you can create a data frame which contains two columns. map_dfr()
binds two data frames.
library(rvest)
library(tidyverse)
map_dfr(.x = paste("https://news.ycombinator.com/news?p", 1:2, sep = ""),
.f = function(x){tibble(url = x,
title = read_html(x) %>%
html_nodes("a.storylink") %>%
html_text()
)})
url title
<chr> <chr>
1 https://news.ycombinator.com/news?p1 1k True Fans? Try 100
2 https://news.ycombinator.com/news?p1 Critical Bluetooth Vulnerability in Android (CVE-2020-0022)
3 https://news.ycombinator.com/news?p1 FLIF – Free Lossless Image Format
4 https://news.ycombinator.com/news?p1 The Rapid Growth of Io_uring
5 https://news.ycombinator.com/news?p1 Show HN: Building an open-source language-learning platform
6 https://news.ycombinator.com/news?p1 Why Google Might Prefer Dropping a $22B Business
7 https://news.ycombinator.com/news?p1 TV Backlight Compensation
8 https://news.ycombinator.com/news?p1 This person does not exist
9 https://news.ycombinator.com/news?p1 Angular 9.0
10 https://news.ycombinator.com/news?p1 Before the DNS: how yours truly upstaged the NIC's official HOSTS.TXT (2004)
If you wanna add hnuser, add one more column. In a simple way, you can do the following.
map_dfr(.x = paste("https://news.ycombinator.com/news?p", 1:2, sep = ""),
.f = function(x){tibble(url = x,
title = read_html(x) %>%
html_nodes("a.storylink") %>%
html_text(),
hnuser = read_html(x) %>%
html_nodes("a.hnuser") %>%
html_text()
)})
Upvotes: 3