Reputation: 25
I am trying to create a data frame getting data from pubmed website. I have a webpage containing links to subpages from all of which I would scrape some text but my code does not work and I cannot grab the abstract text I would. I searched on other subjects here but apparently cannot solve the issue. Here's my code, yet.
library(xml2)
library(rvest)
library(tibble)
library(dplyr)
library(tidyverse)
link <- "https://pubmed.ncbi.nlm.nih.gov/?term=((((((%E2%80%98Food%20Supply%E2%80%99%20(MeSH))%20OR%20%E2%80%98Food%20Storage%E2%80%99%20(MeSH))%20OR%20%E2%80%98Hunger%E2%80%99(MeSH)%20OR%20food%20security%20OR%20food%20insecurity%20OR%20household%20food%20security%20OR%20global%20food%20security)%20OR%20household%20food%20insecurity)))%20AND%20((%E2%80%98Prevalence%E2%80%99%20(MeSH))%20OR%20%E2%80%98Cross-Sectional%20Studies%E2%80%99%20(MeSH)%20OR%20cross-sectional%20study%20OR%20Prevalence%20Studies%20OR%20prevalence%20study%20OR%20Cross-Sectional%20Analyses%20OR%20CrossSectional%20Analysis%20OR%20Cross%20Sectional%20Analysis%20OR%20Cross%20Sectional%20Analyses)&filter=lang.english&filter=lang.portuguese"
# I start building variables for a data frame.
page <- read_html(link)
name <- page %>%
html_nodes(".docsum-title") %>%
html_text()
name_links_synopsis <- page %>% # This takes all the links tgo the subpages
html_nodes(".docsum-title") %>%
html_attr("href") %>%
paste("https://pubmed.ncbi.nlm.nih.gov", ., sep="")
authors <- page %>%
html_nodes(".full-authors") %>%
html_text()
PMID <- page %>%
html_nodes(".docsum-pmid") %>%
html_text()
synopsis <- page %>%
html_nodes(".full-view-snippet") %>%
html_text()
pubmed <- data.frame(name, authors, name_links_synopsis, PMID, synopsis,
stringsAsFactors = FALSE)
# I create a function to scrape the text of the abstract in every subpage
get_pubmed = function(pubmed_link) {
pubmed_link = "https://pubmed.ncbi.nlm.nih.gov/?term=((((((%E2%80%98Food%20Supply%E2%80%99%20(MeSH))%20OR%20%E2%80%98Food%20Storage%E2%80%99%20(MeSH))%20OR%20%E2%80%98Hunger%E2%80%99(MeSH)%20OR%20food%20security%20OR%20food%20insecurity%20OR%20household%20food%20security%20OR%20global%20food%20security)%20OR%20household%20food%20insecurity)))%20AND%20((%E2%80%98Prevalence%E2%80%99%20(MeSH))%20OR%20%E2%80%98Cross-Sectional%20Studies%E2%80%99%20(MeSH)%20OR%20cross-sectional%20study%20OR%20Prevalence%20Studies%20OR%20prevalence%20study%20OR%20Cross-Sectional%20Analyses%20OR%20CrossSectional%20Analysis%20OR%20Cross%20Sectional%20Analysis%20OR%20Cross%20Sectional%20Analyses)&filter=lang.english&filter=lang.portuguese"
pubmed_page = read_html(pubmed_link)
pubmed_abs = pubmed_page %>% html_nodes(".docsum-title , .docsum-title b") %>%
html_text()
pubmed_abs_tot = name_links_synopsis %>% html_nodes("#eng-abstract p") %>%
html_text()
return(pubmed_abs_tot)
}
Upvotes: 0
Views: 95
Reputation: 6563
You'll find the abstract for each article in the last column.
library(tidyverse)
library(rvest)
page <- "https://pubmed.ncbi.nlm.nih.gov/?term=((((((%E2%80%98Food%20Supply%E2%80%99%20(MeSH))%20OR%20%E2%80%98Food%20Storage%E2%80%99%20(MeSH))%20OR%20%E2%80%98Hunger%E2%80%99(MeSH)%20OR%20food%20security%20OR%20food%20insecurity%20OR%20household%20food%20security%20OR%20global%20food%20security)%20OR%20household%20food%20insecurity)))%20AND%20((%E2%80%98Prevalence%E2%80%99%20(MeSH))%20OR%20%E2%80%98Cross-Sectional%20Studies%E2%80%99%20(MeSH)%20OR%20cross-sectional%20study%20OR%20Prevalence%20Studies%20OR%20prevalence%20study%20OR%20Cross-Sectional%20Analyses%20OR%20CrossSectional%20Analysis%20OR%20Cross%20Sectional%20Analysis%20OR%20Cross%20Sectional%20Analyses)&filter=lang.english&filter=lang.portuguese" %>%
read_html()
df <- page %>%
html_elements(".docsum-content") %>%
map_dfr(~ tibble(
title = .x %>%
html_element(".docsum-title") %>%
html_text2(),
authors = .x %>%
html_element(".full-authors") %>%
html_text2(),
PMID = .x %>%
html_element(".docsum-pmid") %>%
html_text2(),
synopsis = .x %>%
html_element(".full-view-snippet") %>%
html_text2(),
link = .x %>%
html_element(".docsum-title") %>%
html_attr("href") %>%
str_c("https://pubmed.ncbi.nlm.nih.gov", .)
))
get_abstract <- function(link) {
cat("Scraping:", link, "\n")
link %>%
read_html() %>%
html_elements(".abstract-content.selected") %>%
html_text2()
}
df %>%
mutate(
abstract = map_chr(link, get_abstract)
)
# A tibble: 10 × 6
title authors PMID synop…¹ link abstr…²
<chr> <chr> <chr> <chr> <chr> <chr>
1 Food Insecurity and Obesity in US Adolescents: A Population-Based Analysis. Fleming MA, Kane… 3348… "Preva… http… "Backg…
2 Food insecurity and mental health during the COVID-19 pandemic. Polsky JY, Gilmo… 3332… "This … http… "Backg…
3 Household Food Security and Associated Factors among Portuguese Children. Silva MG, Machad… 3493… "This … http… "This …
4 Food Insecurity and Cardiometabolic Markers: Results From the Study of Latino Youth. Maldonado LE, So… 3529… "METHO… http… "Objec…
5 Persistent and Episodic Food Insecurity and Associated Coping Strategies Among College Students. Mitchell A, Elli… 3618… "OBJEC… http… "Objec…
6 Food Insecurity: Child Care Programs' Perspectives. Noerper TE, Elmo… 3499… "BACKG… http… "Backg…
7 Food in the cold: exploring food security and sovereignty in Whitehorse, Yukon. Blom CDB, Steege… 3508… "This … http… "Harsh…
8 Food insecurity among Finnish private service sector workers: validity, prevalence and determinants. Walsh HM, Nevala… 3506… "OBJEC… http… "Objec…
9 Food insecurity in baccalaureate nursing students: A cross-sectional survey. Cockerham M, Cam… 3386… "METHO… http… "Backg…
10 Household food insecurity and educational outcomes in school-going adolescents in Ghana. Masa R, Chowa G. 3271… "We me… http… "Objec…
# … with abbreviated variable names ¹synopsis, ²abstract
Abstract
df %>%
slice(1) %>%
pull(abstract)
"Background: Food insecurity and obesity are significant problems affecting adolescents. There is a paucity of recent data examining this relationship. This study utilizes a recent nationally representative sample of US adolescents to examine the relationship between obesity and food security status, as well as other risk factors. Methods: A cross-sectional analysis of 4777 US adolescents (13-18 years old) was performed using data from the National Health and Nutrition Examination Surveys 2007-2016. Prevalence of obesity based on food security status was calculated. Multivariable logistic regression was performed to examine characteristics of adolescents in relationship to obesity. Results: The prevalence of obesity among adolescents from food insecure households was significantly higher compared to those who were not, with a prevalence ratio of 1.3 (95% CI: 1.2-1.5, p < 0.0001). Food insecurity was associated with a higher unadjusted rate of obesity, with an odds ratio of 1.4 (95% CI: 1.2-1.7, p = 0.0002). After adjustment for potential confounding factors, food insecurity was no longer significantly associated with obesity (OR 1.19, 95% CI: 1.0-1.4, p = 0.08). However, other factors such as black race, Hispanic ethnicity, male sex, and households with a monthly income ≤185% of the poverty line were associated with increased odds of obesity. Conclusions: While the prevalence of obesity in adolescents from food insecure households was higher compared to those who were not, no association between the two was found when accounting for other risk factors. Data on independent food-seeking behaviors of adolescents may help clarify this complex relationship in future work."
Upvotes: 1