youraz
youraz

Reputation: 483

How do I get the link while the loop is running?

library(rvest);library(tidyverse)
urls <- str_c("https://news.ycombinator.com/news?p", seq(1,2,1))

    title <- urls %>% 
      map(
        gettitle <- function(df){
          read_html(df) %>% 
            html_nodes("a.storylink") %>% 
            html_text() %>% 
            enframe(name = NULL)
        }
      ) %>%  
      bind_rows()

It will be a dataframe with one column. I want to create a new column and paste the url which is belong to title for each line.

# A tibble: 6 x 2
  value                                                       url                                  
  <chr>                                                       <chr>                                
1 1k True Fans? Try 100                                       https://news.ycombinator.com/news?p=1
2 FLIF – Free Lossless Image Format                           https://news.ycombinator.com/news?p=1
3 Critical Bluetooth Vulnerability in Android (CVE-2020-0022) https://news.ycombinator.com/news?p=1
4 The Rapid Growth of Io_uring                                https://news.ycombinator.com/news?p=1
5 Show HN: Building an open-source language-learning platform https://news.ycombinator.com/news?p=1
6 TV Backlight Compensation                                   https://news.ycombinator.com/news?p=1

Upvotes: 2

Views: 64

Answers (1)

jazzurro
jazzurro

Reputation: 23574

Here is one way for you. When you loop through each page, you can create a data frame which contains two columns. map_dfr() binds two data frames.

library(rvest)
library(tidyverse)

map_dfr(.x = paste("https://news.ycombinator.com/news?p", 1:2, sep = ""),
        .f = function(x){tibble(url = x,
                                title = read_html(x) %>% 
                                        html_nodes("a.storylink") %>% 
                                        html_text()
                            )})

   url                                  title                                                                       
   <chr>                                <chr>                                                                       
 1 https://news.ycombinator.com/news?p1 1k True Fans? Try 100                                                       
 2 https://news.ycombinator.com/news?p1 Critical Bluetooth Vulnerability in Android (CVE-2020-0022)                 
 3 https://news.ycombinator.com/news?p1 FLIF – Free Lossless Image Format                                           
 4 https://news.ycombinator.com/news?p1 The Rapid Growth of Io_uring                                                
 5 https://news.ycombinator.com/news?p1 Show HN: Building an open-source language-learning platform                 
 6 https://news.ycombinator.com/news?p1 Why Google Might Prefer Dropping a $22B Business                            
 7 https://news.ycombinator.com/news?p1 TV Backlight Compensation                                                   
 8 https://news.ycombinator.com/news?p1 This person does not exist                                                  
 9 https://news.ycombinator.com/news?p1 Angular 9.0                                                                 
10 https://news.ycombinator.com/news?p1 Before the DNS: how yours truly upstaged the NIC's official HOSTS.TXT (2004)

If you wanna add hnuser, add one more column. In a simple way, you can do the following.

map_dfr(.x = paste("https://news.ycombinator.com/news?p", 1:2, sep = ""),
        .f = function(x){tibble(url = x,
                                title = read_html(x) %>% 
                                        html_nodes("a.storylink") %>% 
                                        html_text(),
                                hnuser = read_html(x) %>% 
                                        html_nodes("a.hnuser") %>% 
                                        html_text()
                            )})

Upvotes: 3

Related Questions