Reputation: 5335
Can anyone help me scrape the details of the events listed on this web page and return them in a data frame with one row per event?
After inspecting the page source to find what I thought was the right class attribute, I tried the following, but it's returning a null set. Ditto for html_nodes(".event-list-item")
.
library(rvest)
marches <- read_html("https://map.womensmarch.com/?source=website")
events <- html_nodes(marches, "event-list-item")
I'd like the output to capture at least the date, location, title, and whether the event is virtual or in-person.
Upvotes: 0
Views: 455
Reputation: 34591
You don't need to scrape the data in this case - and can't with rvest
as this data is loaded via javascript after the page loads. Inspecting the page you can see that the event info is JSON retrieved from another site so can be easily accessed directly:
library(jsonlite)
feed <- fromJSON("https://zen-hypatia-739ed6.netlify.app/feed")
dat <- feed$events
str(dat)
'data.frame': 313 obs. of 22 variables:
$ id : int 78 404 260 224 286 108 187 265 326 334 ...
$ public_description : chr "Meet up with signs for:\r\nVote Biden, protect rights of people with disabilities, protect Roe VS Wade, prote"| __truncated__ "The womxn of the Oceti Sakowin, the Seven Sacred Council Fires of the Great Sioux Nation are marching to the po"| __truncated__ "As part of Worcester County's regular Blue Honk and Wave sign holding event (every Friday until the election), "| __truncated__ "Standout for Social Justice \r\nWear Mask \r\nMaintain physical distance of at least 6 feet\r\nBring your signs"| __truncated__ ...
$ campaign : chr "oct-17-march" "oct-17-march" "oct-17-march" "oct-17-march" ...
$ lat : num 42.4 44.1 38.3 42.3 40.8 ...
$ lng : num -71.1 -103.2 -75.1 -71.4 -111.9 ...
$ title : chr "Get Up, Stand Up - Stand Up for Your Rights!" "Oceti Sakwin Womxn’s March 2020" "Honor RBG and Stand for Democracy" "Social Justice" ...
$ event_doors_open_at : logi NA NA NA NA NA NA ...
$ venue : chr "Public island at a major 4 way stop. Intersection of North Harvard St and Western Ave Boston MA 02134" "Zoom webinar. https://aclu.zoom.us/j/5351676736 Rapid City SD 57701" "West Ocean City Park and Ride. 12940 Inlet Isle Lane Ocean City MD 21842" "Rt126 x Rt135. Rt126 x Rt135 Framingham MA 01702" ...
$ hasCapacity : int 1 1 1 1 1 1 1 1 1 1 ...
$ city : chr "Boston" "Rapid City" "Ocean City" "Framingham" ...
$ state : chr "MA" "SD" "MD" "MA" ...
$ zip : chr "02134" "57701" "21842" "01702" ...
$ start_datetime : chr "2020-10-16 11:00:00.000000" "2020-10-16 10:00:00.000000" "2020-10-16 15:00:00.000000" "2020-10-16 17:00:00.000000" ...
$ starts_at_utc : chr "2020-10-16 15:00:00.000000" "2020-10-16 16:00:00.000000" "2020-10-16 19:00:00.000000" "2020-10-16 21:00:00.000000" ...
$ end_datetime : logi NA NA NA NA NA NA ...
$ categories : chr "oct-17-march" "oct-17-march" "oct-17-march" "oct-17-march" ...
$ event_is_virtual : int 0 1 0 0 0 0 0 0 0 0 ...
$ is_official : int 0 0 0 0 0 0 0 0 0 0 ...
$ is_team : int 0 0 0 0 0 0 0 0 0 0 ...
$ url : chr "https://act.womensmarch.org/event/oct-17-march/78/" "https://act.womensmarch.org/event/oct-17-march/404/" "https://act.womensmarch.org/event/oct-17-march/260/" "https://act.womensmarch.org/event/oct-17-march/224/" ...
$ start_datetime_formatted: chr "Friday Oct 16 11:00 AM" "Friday Oct 16 10:00 AM" "Friday Oct 16 3:00 PM" "Friday Oct 16 5:00 PM" ...
$ end_datetime_formatted : logi NA NA NA NA NA NA ...
Upvotes: 2