Reputation: 21
I want to crawl websites. To collect informations, about differnt podcasts. I am interested in Title, Date and Abstract of the show. My results are inclompete and with a lot of blanks.
I tried multiple websites. Some are working, but most aren't. I also switched between the ExtractCSSPath and ExtractXPath argument.
Rcrawler(Website = "https://www.futuretechpodcast.com/all-podcasts/", no_cores = 4, no_conn = 4, ExtractCSSPat = c(".podcast-hero-title", ".podcast-hero-date",".content_text" ),
PatternsNames = c("Title","Date", "Content"), MaxDepth = 1)
The resulting excel sheet has some of the information I want, but most rows are empty. Also, just the information of the first page appears. With other websites this code was succesful.
Is Rcrawler the right package?
I would like to get a full excel file including all dates, title and abstracts.
Upvotes: 2
Views: 82