Reputation: 4040
I want to get all names of all meal names from Wikipedia:
https://en.wikipedia.org/wiki/Lists_of_prepared_foods
How can I query it in R?
There is a query
function but without good example how to do this.
Upvotes: 0
Views: 440
Reputation: 9485
I know there is a package called wikipedir
that helps, but also rvest
could be helpful:
library(rvest)
URL <- "https://en.wikipedia.org/wiki/Lists_of_prepared_foods"
temp <- URL %>%
read_html %>%
html_nodes("#mw-content-text h3+ ul a , .column-width a") %>% html_text()
[1] "List of almond dishes" "List of ancient dishes" "List of avocado dishes"
[4] "List of bacon substitutes" "List of baked goods" "List of breakfast beverages"
[7] "List of breakfast cereals" "List of breakfast foods" "List of cabbage dishes"
[10] "List of cakes" "List of candies" "List of carrot dishes" ... (trunc. output)
EDIT
To scrape the names in each page, I advice you to make a loop to solve the problem, using the vector temp
created above but scraping the links:
temp <- URL %>%
read_html %>%
html_nodes("#mw-content-text h3+ ul a , .column-width a") %>% html_attr('href')
temp
[1] "/wiki/List_of_almond_dishes" "/wiki/List_of_ancient_dishes"
[3] "/wiki/List_of_avocado_dishes" "/wiki/List_of_bacon_substitutes" ... trunc. output)
Now you create an empty list to populate with the foods for each link:
# an empty list
listed <- list()
for (i in temp) {
# here you create the url made by https... + the scraped urls above
url <- paste0("https://en.wikipedia.org/",i)
# for each url, you'll have a component of the list with the extracted names
listed[[i]] <- url %>%
read_html %>%
# be sure to get the correct nodes, they seems these
html_nodes("h2~ ul li > a:nth-child(1) , a a") %>% html_text()
Sys.sleep(15) # very important: you'll add a 15 sec after each link scraped
# to not overload of requests the site in a small range of time
}
As result:
$`/wiki/List_of_almond_dishes`
[1] "Ajoblanco" "Almond butter" "Alpen (food)" "Amandine (culinary term)" "Amlu"
[6] "Bakewell tart" "Bear claw (pastry)" "Bethmännchen" "Biscuit Tortoni" "Blancmange"
[11] "Christmas cake" "Churchkhela" "Ciarduna" "Colomba di Pasqua" "Comfit"
[16] "Coucougnette" "Crème de Noyaux" "Cruncheroos" "Dacquoise" "Daim bar"
[21] "Dariole" "Esterházy torte" ... (trunc. output)
$`/wiki/List_of_ancient_dishes`
[1] "Anfu ham" "Babaofan" "Bread" "Flatbread" "Focaccia" "Mantou"
[7] "Chili pepper" "Chutney" "Congee" "Curry" "Doubanjiang" "Fish sauce"
[13] "Forcemeat" "Garum" "Ham" "Harissa" "Jeok" "Jusselle"
[19] "Liquamen" "Maccu" "Misu karu" "Moretum" "Nian gao" "Noodle" ... (trunc. output)
Upvotes: 2