ulfelder
ulfelder

Reputation: 5335

How can I scrape event details from this web page?

Can anyone help me scrape the details of the events listed on this web page and return them in a data frame with one row per event?

After inspecting the page source to find what I thought was the right class attribute, I tried the following, but it's returning a null set. Ditto for html_nodes(".event-list-item").

library(rvest)

marches <- read_html("https://map.womensmarch.com/?source=website")

events <- html_nodes(marches, "event-list-item")

I'd like the output to capture at least the date, location, title, and whether the event is virtual or in-person.

Upvotes: 0

Views: 455

Answers (1)

lroha
lroha

Reputation: 34591

You don't need to scrape the data in this case - and can't with rvest as this data is loaded via javascript after the page loads. Inspecting the page you can see that the event info is JSON retrieved from another site so can be easily accessed directly:

library(jsonlite)

feed <- fromJSON("https://zen-hypatia-739ed6.netlify.app/feed")
dat <- feed$events

str(dat)

'data.frame':   313 obs. of  22 variables:
 $ id                      : int  78 404 260 224 286 108 187 265 326 334 ...
 $ public_description      : chr  "Meet up with signs for:\r\nVote  Biden, protect rights of people with disabilities,  protect Roe VS Wade, prote"| __truncated__ "The womxn of the Oceti Sakowin, the Seven Sacred Council Fires of the Great Sioux Nation are marching to the po"| __truncated__ "As part of Worcester County's regular Blue Honk and Wave sign holding event (every Friday until the election), "| __truncated__ "Standout for Social Justice \r\nWear Mask \r\nMaintain physical distance of at least 6 feet\r\nBring your signs"| __truncated__ ...
 $ campaign                : chr  "oct-17-march" "oct-17-march" "oct-17-march" "oct-17-march" ...
 $ lat                     : num  42.4 44.1 38.3 42.3 40.8 ...
 $ lng                     : num  -71.1 -103.2 -75.1 -71.4 -111.9 ...
 $ title                   : chr  "Get Up, Stand Up - Stand Up for Your Rights!" "Oceti Sakwin Womxn’s March 2020" "Honor RBG and Stand for Democracy" "Social Justice" ...
 $ event_doors_open_at     : logi  NA NA NA NA NA NA ...
 $ venue                   : chr  "Public island at a major 4 way stop. Intersection of North Harvard St and Western Ave Boston MA 02134" "Zoom webinar. https://aclu.zoom.us/j/5351676736 Rapid City SD 57701" "West Ocean City Park and Ride. 12940 Inlet Isle Lane Ocean City MD 21842" "Rt126 x Rt135. Rt126 x Rt135 Framingham MA 01702" ...
 $ hasCapacity             : int  1 1 1 1 1 1 1 1 1 1 ...
 $ city                    : chr  "Boston" "Rapid City" "Ocean City" "Framingham" ...
 $ state                   : chr  "MA" "SD" "MD" "MA" ...
 $ zip                     : chr  "02134" "57701" "21842" "01702" ...
 $ start_datetime          : chr  "2020-10-16 11:00:00.000000" "2020-10-16 10:00:00.000000" "2020-10-16 15:00:00.000000" "2020-10-16 17:00:00.000000" ...
 $ starts_at_utc           : chr  "2020-10-16 15:00:00.000000" "2020-10-16 16:00:00.000000" "2020-10-16 19:00:00.000000" "2020-10-16 21:00:00.000000" ...
 $ end_datetime            : logi  NA NA NA NA NA NA ...
 $ categories              : chr  "oct-17-march" "oct-17-march" "oct-17-march" "oct-17-march" ...
 $ event_is_virtual        : int  0 1 0 0 0 0 0 0 0 0 ...
 $ is_official             : int  0 0 0 0 0 0 0 0 0 0 ...
 $ is_team                 : int  0 0 0 0 0 0 0 0 0 0 ...
 $ url                     : chr  "https://act.womensmarch.org/event/oct-17-march/78/" "https://act.womensmarch.org/event/oct-17-march/404/" "https://act.womensmarch.org/event/oct-17-march/260/" "https://act.womensmarch.org/event/oct-17-march/224/" ...
 $ start_datetime_formatted: chr  "Friday Oct 16 11:00 AM" "Friday Oct 16 10:00 AM" "Friday Oct 16 3:00 PM" "Friday Oct 16 5:00 PM" ...
 $ end_datetime_formatted  : logi  NA NA NA NA NA NA ...

Upvotes: 2

Related Questions