Reputation: 283
I am able to scrape the infobox on any Wikipedia site using rvest
but I want to do the same on a wiki page but can't get it working ...
The link: https://dc.fandom.com/wiki/Wonder_Woman_(Diana_Prince) On the page you have the infobox (which looks like a normal Wikipedia table) the CSS selector appears to be ".pi-layout-default"
I want a data frame that contains the real name, aliases etc..
Any idea on how to do this?
Upvotes: 1
Views: 388
Reputation: 581
Use rvest
and selectorgadet
!
library(rvest)
library(tidyverse)
read_html("https://dc.fandom.com/wiki/Wonder_Woman_(Diana_Prince)") %>%
html_nodes(".pi-font , .pi-data-label") %>%
html_text() %>%
matrix(ncol = 2, byrow = TRUE) %>%
as_tibble()
# A tibble: 21 x 2
V1 V2
<chr> <chr>
1 Real Name Diana of Themyscira
2 Current Alias Wonder Woman
3 Aliases Diana Prince, Princess Diana, Miss America, Goddess of Truth, Dinanna Truthqueen
4 Relatives Ares (grandfather)[1]Hippolyta (mother)Antiope (aunt, deceased)Theseus (uncle by Antiope, deceased)Hippolytus (c~
5 Affiliation Justice League · formerly Department of Metahuman Affairs, Star Sapphire Corps, Female Furies, White Lantern Cor~
6 Base Of Operatio~ Washington, D.C. · Themyscira · JLA Watchtower, Hall of Justice · formerly Boston, Gateway City
7 Alignment Good
8 Identity Public Identity
9 Race Amazon
10 Citizenship Amazon
# ... with 11 more rows
Upvotes: 2