A Berg
A Berg

Reputation: 21

Rcrawler does not collect all pages

I want to crawl websites. To collect informations, about differnt podcasts. I am interested in Title, Date and Abstract of the show. My results are inclompete and with a lot of blanks.

I tried multiple websites. Some are working, but most aren't. I also switched between the ExtractCSSPath and ExtractXPath argument.

Rcrawler(Website = "https://www.futuretechpodcast.com/all-podcasts/", no_cores = 4, no_conn = 4, ExtractCSSPat = c(".podcast-hero-title", ".podcast-hero-date",".content_text" ), 
PatternsNames = c("Title","Date", "Content"), MaxDepth = 1)

The resulting excel sheet has some of the information I want, but most rows are empty. Also, just the information of the first page appears. With other websites this code was succesful.

Is Rcrawler the right package?

I would like to get a full excel file including all dates, title and abstracts.

Upvotes: 2

Views: 82

Answers (0)

Related Questions