Reputation: 13
I'm working on pulling information from a table on a website. The output of the table looks like this (see below).
1. Saturday
2. 4:00 PM
3. 5:30 PM
4. Sunday
5. 8:30 AM
6. 10:00 AM
I really need it to come through like this (see below). I don't think I can transform it with the html_table()
function, but I was hoping someone knows how to reformat it after the fact in R.
1. Saturday 4:00 PM
2. Saturday 5:30 PM
3. Sunday 8:30 AM
4. Sunday 10:00 AM
Here is the code I'm using:
urls <- 'https://www.life.church/edmond/'
times <- function(x){
try( x %>%
read_html()%>%
html_table(header = F)%>%
data.frame(x))
}
#Apply function to the urls
m <- lapply(urls, times)
#Convert to a dataframe
data <-data.frame(unnest(tibble(m)))
Upvotes: 1
Views: 47
Reputation: 42544
That's what I would do:
library(dplyr)
library(xml2)
library(rvest)
library(tidyr)
library(purrr)
times <- function(x){
try(
x %>%
read_html() %>%
html_table(header = FALSE) %>%
flatten() %>%
as_tibble()
)
}
urls <- c('https://www.life.church/edmond/', 'https://www.life.church/fortworth/')
lapply(urls, times) %>%
set_names(urls) %>%
bind_rows(.id = "URL") %>%
separate(X1, into = c("Time", "Day"), sep = "(?=^\\D)") %>%
fill(Day) %>%
filter(Time != "") %>%
select(URL, Day, Time)
# A tibble: 16 x 3 URL Day Time <chr> <chr> <chr> 1 https://www.life.church/edmond/ Saturday 4:00 PM 2 https://www.life.church/edmond/ Saturday 5:30 PM 3 https://www.life.church/edmond/ Sunday 8:30 AM 4 https://www.life.church/edmond/ Sunday 10:00 AM 5 https://www.life.church/edmond/ Sunday 11:30 AM 6 https://www.life.church/edmond/ Sunday 1:00 PM 7 https://www.life.church/edmond/ Sunday 4:00 PM 8 https://www.life.church/edmond/ Sunday 5:30 PM 9 https://www.life.church/edmond/ Wednesday 7:00 PM 10 https://www.life.church/fortworth/ Saturday 4:00 PM 11 https://www.life.church/fortworth/ Saturday 5:30 PM 12 https://www.life.church/fortworth/ Sunday 8:30 AM 13 https://www.life.church/fortworth/ Sunday 10:00 AM 14 https://www.life.church/fortworth/ Sunday 11:30 AM 15 https://www.life.church/fortworth/ Sunday 1:00 PM 16 https://www.life.church/fortworth/ Wednesday 7:00 PM
separate()
uses a look-ahead regular expression to separate the entries which are not starting with a digit into new column Day
Upvotes: 1