Reputation: 55
I need to get the weblinks for all followers listed in the following page.
https://www.researchgate.net/topic/biotechnology
There are 206770 followers for this topic at this moment. When i click the "View all" button, a popup appears which gives a list and it keeps on expanding as i go down.
https://www.researchgate.net/profile/Kestutis_Sasnauskas ...
The above are the links for the top follower. Is there a way we can get the weblinks for all 206770 followers?
Upvotes: 4
Views: 123
Reputation: 30425
The server returns the data as JSON if you ask for it. Subsequent calls use an offset parameter that the previous JSON call supplies. In the example below I have just called the first 10 offsets. This is equivalent to scrolling down 10 times. There is alot more data then just the profile web links:
library(RCurl)
library(XML)
library(jsonlite)
# get initial page
initURL <- "http://www.researchgate.net/topic/biotechnology"
doc <- htmlParse(initURL)
noFollowers <- doc["//*/strong/*/a[@class='js-see-all']", fun = xmlValue][[1]]
noFollowers <- as.integer(gsub("[^0-9]", "", noFollowers))
appURL <- "http://www.researchgate.net/publictopics.KeywordFollowersPeopleList.html?view=dialog&showFollowButton=1&followEvent=tp_followers_xflw&keywordId=4f15497280e582373c000000"
appData <- getURL(appURL
, httpheader = c(accept = "application/json"))
follData <- list(fromJSON(appData)$result$data$content$data$listItems)
for(i in 1:10){
nextURL <- fromJSON(appData)$result$data$nextOffset
appData <- getURL(paste0(appURL, "&offset=", nextURL)
, httpheader = c(accept = "application/json"))
follData[[i+1]] <- fromJSON(appData)$result$data$content$data$listItems
}
followers <- na.omit(do.call(c, lapply(follData, function(x){x$data$url})))
> head(followers)
[1] "profile/Subhashish_Dutta" "profile/Jerome_Wang3" "profile/Jose_Carbajo2"
[4] "profile/Daniele_Riccio" "profile/Fiona_Togneri2" "profile/Sukanya_Patel"
Upvotes: 0
Reputation: 5951
This can be done with the use of rvest
and RSelenium
. The latter is mostly needed, the former will make your life easier. Install RSelenium
from github devtools::install_github("ropensci/RSelenium")
. rvest
from cran.
Here is the code you need to accomplish what you are looking for.
siteUrl <- "http://www.researchgate.net/"
GateUrl <- "http://www.researchgate.net/publictopics.KeywordFollowersPeopleList.html?view=dialog&showFollowButton=1&followEvent=tp_followers_xflw&keywordId=4f15497280e582373c000000&offset="
library(rvest)
library(RSelenium)
checkForServer()
startServer()
remDrv <- remoteDriver()
remDrv$open(silent = FALSE)
i <- 0
profileUrls <- c()
for(j in 1:3){
print(j)
remDrv$navigate(paste0(GateUrl, i))
l <- html(remDrv$getPageSource()[[1]])
profileUrls <- c(profileUrls,
paste0(siteUrl, l %>% html_nodes(".display-name") %>% xml_attr("href")))
i <- length(profileUrls)+1
}
remDrv$close()
profileUrls
A couple of things here. You need to figure out the j
loop. I think it picks up 38 profiles with each url, so the j
should be something like for(j in 1:(followers/38))
.
The second point is that the code is not very efficient in the way it saves the links i.e. it appends it each time. A better solution would be to use lapply
and the unlist
.
Last point you need mozilla firefox on your machine, since this is the default used from RSelenium
though you can set it to use whichever of the most popular browsers you hove.
Results From the first 56
> profileUrls
[1] "http://www.researchgate.net/profile/Jose_Carbajo2"
[2] "http://www.researchgate.net/profile/Daniele_Riccio"
[3] "http://www.researchgate.net/profile/Fiona_Togneri2"
[4] "http://www.researchgate.net/profile/Sukanya_Patel"
[5] "http://www.researchgate.net/profile/Neri_Fattorini"
[6] "http://www.researchgate.net/profile/Pham_Thi_Thuy_Van"
[7] "http://www.researchgate.net/profile/Kestutis_Sasnauskas"
[8] "http://www.researchgate.net/profile/Iris_Weintal"
[9] "http://www.researchgate.net/profile/Godelieve_Verhaegen"
[10] "http://www.researchgate.net/profile/Janani_Venkatraman2"
[11] "http://www.researchgate.net/profile/Kai_Wang126"
[12] "http://www.researchgate.net/profile/Irine_Ronin"
[13] "http://www.researchgate.net/profile/Natasha_Ikhsan"
[14] "http://www.researchgate.net/profile/Nadya_Hajar"
[15] "http://www.researchgate.net/profile/Gayatr_Venkataraman2"
[16] "http://www.researchgate.net/profile/Amsha_Viraragavan"
[17] "http://www.researchgate.net/profile/Wei_Leiyan"
[18] "http://www.researchgate.net/profile/Yosuke_Inada"
[19] "http://www.researchgate.net/profile/Nadya_Hajar"
[20] "http://www.researchgate.net/profile/Gayatr_Venkataraman2"
[21] "http://www.researchgate.net/profile/Amsha_Viraragavan"
[22] "http://www.researchgate.net/profile/Wei_Leiyan"
[23] "http://www.researchgate.net/profile/Yosuke_Inada"
[24] "http://www.researchgate.net/profile/Yongning_You"
[25] "http://www.researchgate.net/profile/Susan_Hu6"
[26] "http://www.researchgate.net/profile/Matt_Evans11"
[27] "http://www.researchgate.net/profile/Nam_Kieu"
[28] "http://www.researchgate.net/profile/Nur_Musa3"
[29] "http://www.researchgate.net/profile/Varaporn_S"
[30] "http://www.researchgate.net/profile/Askar_Begzat3"
[31] "http://www.researchgate.net/profile/Bing_Wang63"
[32] "http://www.researchgate.net/profile/Xuebin_Yan"
[33] "http://www.researchgate.net/profile/Roberto_Sibaja_Hernandez"
[34] "http://www.researchgate.net/profile/Stephen_Heimann"
[35] "http://www.researchgate.net/profile/Hanina_Hanifa"
[36] "http://www.researchgate.net/profile/Bo_Wang143"
[37] "http://www.researchgate.net/profile/Xuebin_Yan"
[38] "http://www.researchgate.net/profile/Roberto_Sibaja_Hernandez"
[39] "http://www.researchgate.net/profile/Stephen_Heimann"
[40] "http://www.researchgate.net/profile/Hanina_Hanifa"
[41] "http://www.researchgate.net/profile/Bo_Wang143"
[42] "http://www.researchgate.net/profile/Huili_Li5"
[43] "http://www.researchgate.net/profile/Giuseppe_Infusini"
[44] "http://www.researchgate.net/profile/Carmen_Wacher"
[45] "http://www.researchgate.net/profile/Linyn_Linyn"
[46] "http://www.researchgate.net/profile/Dan_Youel"
[47] "http://www.researchgate.net/profile/Catherine_Williams16"
[48] "http://www.researchgate.net/profile/Nichole_Macaraeg"
[49] "http://www.researchgate.net/profile/Peter_Oroszlan"
[50] "http://www.researchgate.net/profile/Eduard_Karamov"
[51] "http://www.researchgate.net/profile/Mauricio_Franco3"
[52] "http://www.researchgate.net/profile/Patricia_Zancan"
[53] "http://www.researchgate.net/profile/Rohana_Dassanayake"
[54] "http://www.researchgate.net/profile/Khadija_Khataby"
[55] "http://www.researchgate.net/profile/Imane_Moest"
[56] "http://www.researchgate.net/profile/Rory_Adey"
Upvotes: 1
Reputation: 54247
As an alternative to RSelenium
, you could try it like this (first 56 followers as an example):
library(XML)
library(jsonlite)
offsets <- seq(from = 1, to = 50, 18)
urls <- sprintf("http://www.researchgate.net/publictopics.KeywordFollowersPeopleList.html?view=dialog&showFollowButton=1&followEvent=tp_followers_xflw&keywordId=4f15497280e582373c000000&offset=%d", offsets)
df <- data.frame()
for (x in seq_along(urls)) {
doc <- htmlParse(urls[x])
script <- as(doc[['//script[5]']], "character")
splits <- strsplit(script, '\\(function\\(\\)\\{Y\\.rg\\.createInitialWidget\\("[^\"]+",')[[1]][-1]
res <- lapply(splits, function(split) {
split <-sub(");})();\n", "", split, fixed = TRUE)
res <- try(as.data.frame(t(unlist(fromJSON(gsub("\\\\", "", split))))), silent = TRUE)
if (!inherits(res, "try-error")) return(res) else return(NULL)
})
df <- rbind(df, do.call(rbind, res[1:(length(res)-2)]))
}
dplyr::glimpse(df)
# Observations: 56
# Variables:
# $ _isReact (fctr) FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F...
# $ data.displayName (fctr) Jose Maria Carbajo, Daniele Riccio, Fiona S Togneri, Sukanya Paramashivaiah Patel, Neri Fattorini, Pham thi thuy van, Kestutis Sasnauskas, Iris Weintal, Godelieve Verhaegen, Ja...
# $ data.profile.professionalInstitution.professionalInstitutionName (fctr) Instituto Nacional de Investigaciu00f3n y Tecnologu00eda Agraria y Alimentaria, University of Milan, Birmingham Women's NHS Foundation Trust, Himalya drug company, University o...
# $ data.profile.professionalInstitution.professionalInstitutionUrl (fctr) institution/Instituto_Nacional_de_Investigaciones_y_Experiencias_Agronomicas_y_Forestales, institution/University_of_Milan, institution/Birmingham_Womens_NHS_Foundation_Trust, ...
# $ data.professionalInstitutionName (fctr) Instituto Nacional de Investigaciu00f3n y Tecnologu00eda Agraria y Alimentaria, University of Milan, Birmingham Women's NHS Foundation Trust, Himalya drug company, University o...
# $ data.professionalInstitutionUrl (fctr) institution/Instituto_Nacional_de_Investigaciones_y_Experiencias_Agronomicas_y_Forestales, institution/University_of_Milan, institution/Birmingham_Womens_NHS_Foundation_Trust, ...
# $ data.url (fctr) profile/Jose_Carbajo2, profile/Daniele_Riccio, profile/Fiona_Togneri2, profile/Sukanya_Patel, profile/Neri_Fattorini, profile/Pham_Thi_Thuy_Van, profile/Kestutis_Sasnauskas, pr...
# $ data.imageUrl (fctr) http://c1.rgstatic.net/m/797670414832/images/template/default/profile/profile_default_m.jpg, http://i1.rgstatic.net/i/profile/54a1a5539f8e2f289f_m_25d91.jpg, http://i1.rgstatic...
# $ data.imageSize (fctr) m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m, m
# $ data.imageHeight (fctr) 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, ...
# $ data.imageWidth (fctr) 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, ...
# $ data.enableFollowButton (fctr) TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TR...
# $ data.enableHideButton (fctr) FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F...
# $ data.enableConnectionButton (fctr) FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F...
# $ data.isClaimedAuthor (fctr) TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TR...
# $ data.hasExtraContainer (fctr) FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F...
# $ data.showStatsWidgets (fctr) FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F...
# $ data.showHideButton (fctr) FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F...
# $ data.accountKey (fctr) Jose_Carbajo2, Daniele_Riccio, Fiona_Togneri2, Sukanya_Patel, Neri_Fattorini, Pham_Thi_Thuy_Van, Kestutis_Sasnauskas, Iris_Weintal, Godelieve_Verhaegen, Janani_Venkatraman2, Ka...
# $ data.hasInfoPopup (fctr) FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F...
# $ data.hasTeaserPopup (fctr) TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TR...
# $ data.widgetId (fctr) rgw3_5539fc8299ef4, rgw4_5539fc8299ef4, rgw5_5539fc8299ef4, rgw6_5539fc8299ef4, rgw7_5539fc8299ef4, rgw8_5539fc8299ef4, rgw9_5539fc8299ef4, rgw10_5539fc8299ef4, rgw11_5539fc829...
# $ id (fctr) rgw3_5539fc8299ef4, rgw4_5539fc8299ef4, rgw5_5539fc8299ef4, rgw6_5539fc8299ef4, rgw7_5539fc8299ef4, rgw8_5539fc8299ef4, rgw9_5539fc8299ef4, rgw10_5539fc8299ef4, rgw11_5539fc829...
# $ templateName (fctr) application/stubs/PeopleItem.html, application/stubs/PeopleItem.html, application/stubs/PeopleItem.html, application/stubs/PeopleItem.html, application/stubs/PeopleItem.html, a...
# $ templateExtensions (fctr) generalHelpers, generalHelpers, generalHelpers, generalHelpers, generalHelpers, generalHelpers, generalHelpers, generalHelpers, generalHelpers, generalHelpers, generalHelpers, ...
# $ widgetUrl (fctr) http://www.researchgate.net/application.PeopleAccountItem.html?entityId=7508014&imageSize=m&enableFollowButton=1&showHideButton=0&showConnectionButton=0&event=tp_followers_xflw...
# $ viewClass (fctr) views.application.PeopleItemView, views.application.PeopleItemView, views.application.PeopleItemView, views.application.PeopleItemView, views.application.PeopleItemView, views....
# $ yuiModules (fctr) rg.views.application.PeopleItemView, rg.views.application.PeopleItemView, rg.views.application.PeopleItemView, rg.views.application.PeopleItemView, rg.views.application.PeopleI...
Upvotes: 0