Reputation: 1599
I'm trying to scrape Marvel movies with their characters (featured, support, antagonists, other) on marvel.wikia.com. Now these characters live in lists in the DOM and I can't get the right html_nodes()
to get all the list items underneath each character type.
The following code extracts all the listed links, while I want only the ones belonging to the featured- support- antagonists- and othercharacters (not applicable for X2).
library(rvest)
library(tidyverse)
test_url <- "http://marvel.wikia.com/wiki/X2_(film)"
read_html(test_url) %>%
html_nodes("li > a") %>%
html_text()
Desired outcome:
# A tibble: 16 x 3
movie type character
<chr> <chr> <chr>
1 X2 Featured Characters Professor Charles Xavier
2 X2 Featured Characters Wolverine (Logan)
3 X2 Featured Characters Storm (Ororo Munroe)
4 X2 Featured Characters Dr. Jean Grey
5 X2 Featured Characters Cyclops (Scott Summers)
6 X2 Featured Characters Rogue (Marie)
7 X2 Featured Characters Iceman (Bobby Drake)
8 X2 Supporting Characters Nightcrawler (Kurt Wagner)
9 X2 Supporting Characters Pyro (John Allerdyce)
10 X2 Supporting Characters Mystique (Raven Darkholme)
11 X2 Supporting Characters Magneto (Erik Lehnsherr)
12 X2 Antagonists Col. William Stryker
13 X2 Antagonists Sgt. Lyman
14 X2 Antagonists Unnamed Soldiers
15 X2 Antagonists Deathstrike (Yuriko Oyama)
16 X2 Antagonists Mutant 143 (Jason Stryker)
Upvotes: 1
Views: 149
Reputation: 11955
You could start with something like this -
library(rvest)
library(tidyverse)
test_url <- "http://marvel.wikia.com/wiki/X2_(film)"
#scrape data
url_data <- read_html(test_url) %>%
html_nodes(xpath = '//*[@id="mw-content-text"]/ul') %>%
html_text()
#format scrapped data into desired format
df <- data.frame(movie = gsub(".*/", "", test_url),
type = c("Featured Characters", "Supporting_Characters", "Antagonists", "Other_Characters"),
characters = url_data[1:4]) %>%
separate_rows(characters, sep = "\\n")
which gives
> head(df)
movie type characters
1 X2_(film) Featured Characters X-Men
2 X2_(film) Featured Characters Professor Charles Xavier
3 X2_(film) Featured Characters Wolverine (Logan)
4 X2_(film) Featured Characters Storm (Ororo Munroe)
5 X2_(film) Featured Characters Dr. Jean Grey (Apparent death)
6 X2_(film) Featured Characters Cyclops (Scott Summers)
Upvotes: 2