Webscraping with R

Question

I want to scrape this web page with R and rvest. I want to extract the fifty words in this format:

So far I have been able to do this:

library(rvest)
library(dplyr)
words<-read_html("https://www.education.com/magazine/article/Ed_50_Words_SAT_Loves_2/")
just_words<-words %>% html_nodes("ol") %>% html_text()
strsplit(just_words,"
	")

I am able to reach up to this output only, with all 50 words:

Any help?

EDIT

I am well aware of the Terms of Use of the website. This question is solely for practice purposes to improve scraping skills.

Rohan · Accepted Answer

I converted just_words list into dataframe and then used separate in tidyr package to split the column.

library(rvest)
library(dplyr)
library(stringr)
library(tidyr)
words<-read_html("https://www.education.com/magazine/article/Ed_50_Words_SAT_Loves_2/")
just_words<-words %>% html_nodes("ol") %>% html_text()
x <- as.data.frame(strsplit(just_words,"
	"), col.names = "V1")
head(x)
t <- x %>% separate(V1, c("Word", "Meaning"), extra = "merge", fill = "left")
head(t)

Output:

> head(t)
        Word                                             Meaning
1   abstract                                        not concrete
2  aesthetic        having to do with the appreciation of beauty
3  alleviate                          to ease a pain or a burden
4 ambivalent simultaneously feeling opposing feelings; uncertain
5  apathetic                   feeling or showing little emotion
6 auspicious                                favorable; promising

If you are looking for a more formatted output, use pander package. The output displays as below:

> library(pander)
> pander(head(t))

---------------------------------------
   Word              Meaning           
---------- ----------------------------
 abstract          not concrete        

aesthetic     having to do with the    
              appreciation of beauty   

alleviate   to ease a pain or a burden 

ambivalent    simultaneously feeling   
           opposing feelings; uncertain

apathetic   feeling or showing little  
                     emotion           

auspicious     favorable; promising    
---------------------------------------

To remove line breaks and spaces.

t <- t %>% mutate(Meaning=gsub("[
]", "", Meaning)) %>% tail()

Webscraping with R

Answers (2)

Related Questions