Reputation: 115
I'm new to rscopus, and am trying to find the number of publications by year for a list of authors, for whom I have extracted the author ids. However, I am not able to get this in a data frame format as intended. I am using the following code:
library(dplyr)
library(tidyr)
library(lubridate)
library(data.table)
library(rscopus)
set_api_key("MY_KEY")
hdr=inst_token_header("MY_TOKEN")
key=get_api_key()
#My data set is author_names_1
for (i in seq_along(1:10)) {
print(i)
try(scopus_author_data <- author_data(last_name = author_names_1$LastName[i],
first_name = author_names_1$FirstName[i], headers = hdr))
if(inherits(scopus_author_data, "try-error")) {
next
}
author_names_1$scopus_id[i] = scopus_author_data$au_id
author_names_1$scopus_first_name[i] = scopus_author_data$first_name
author_names_1$scopus_last_name[i] = scopus_author_data$last_name
author_names_1$scopus_first_pub_yr[i] = min(year(ymd(scopus_author_data[["full_data"]][["df"]][["prism:coverDate"]])))
scopus_id = scopus_author_data$au_id[i]
try(pub_by_yr <- data.frame(scopus_id = scopus_author_data$au_id[i],
pub_yr = year(ymd(scopus_author_data[["full_data"]][i][["df"]][["prism:coverDate"]]))))
if(inherits(pub_by_yr, "try-error")) {
next
}
}
Alternatively, I also tried:
pub_by_yr <- data.frame(matrix(ncol = 2, nrow = 0))
colnames(pub_by_yr) <- c('scopus_id', 'pub_yr')
for (i in seq_along(1:4)) {
print(i)
try(scopus_author_data <- author_data(last_name = author_names_1$LastName[i],
first_name = author_names_1$FirstName[i], headers = hdr))
if(inherits(scopus_author_data, "try-error")) {
next
}
author_names_1$scopus_id[i] = scopus_author_data$au_id
author_names_1$scopus_first_name[i] = scopus_author_data$first_name
author_names_1$scopus_last_name[i] = scopus_author_data$last_name
author_names_1$scopus_first_pub_yr[i] = min(year(ymd(scopus_author_data[["full_data"]][["df"]][["prism:coverDate"]])))
print(scopus_author_data$au_id)
scopus_id[i] = scopus_author_data$au_id
pub_by_yr[i, ] <- c(scopus_author_data$au_id[i], year(ymd(scopus_author_data$full_data$df$`prism:coverDate`)))
}
The motive here is to get a new data set with author ids (which I get from author_data()) along with the publication years. But what I am getting is publication years only for the first author id. Does anyone knows how to work around this?
I would really appreciate any help here.
Thank You So Much!
Upvotes: 0
Views: 86
Reputation: 21
There may be a few issues with the first script:
Here's a potentially updated version of the script with these issues addressed:
library(dplyr)
library(tidyr)
library(lubridate)
library(data.table)
library(rscopus)
set_api_key("MY_KEY")
hdr = inst_token_header("MY_TOKEN")
key = get_api_key()
#My data set is author_names_1
for (i in seq_along(author_names_1$LastName)) {
print(i)
# use try() to catch errors
scopus_author_data <- try(
author_data(last_name = author_names_1$LastName[i],
first_name = author_names_1$FirstName[i], headers = hdr)
)
if(!inherits(scopus_author_data, "try-error")) {
author_names_1$scopus_id[i] = scopus_author_data$au_id
author_names_1$scopus_first_name[i] = scopus_author_data$first_name
author_names_1$scopus_last_name[i] = scopus_author_data$last_name
author_names_1$scopus_first_pub_yr[i] = min(year(ymd(scopus_author_data[["full_data"]][["df"]][["prism:coverDate"]])))
# build a data frame with all publication years outside the loop
pub_yr <- data.frame(
scopus_id = scopus_author_data$au_id,
pub_yr = year(ymd(scopus_author_data[["full_data"]][["df"]][["prism:coverDate"]]))
)
}
}
In this updated script, the try() function is being used to catch errors that may occur when calling author_data() function. The if() statement checks if the scopus_author_data object is not an error object and updates the author_names_1 data frame with the required values. The data.frame() function is called outside the loop to build a data frame with all publication years. This would be more efficient than building a data frame inside the loop.
There may be a few issues with the second script:
The loop for (i in seq_along(1:4)) could be written as for (i in 1:4). The seq_along function creates a sequence of the same length as the argument, so seq_along(1:4) is equivalent to 1:4.
The line scopus_id[i] = scopus_author_data$au_id could be changed to scopus_id <- scopus_author_data$au_id. This is because scopus_id is a vector and you might want to assign the entire vector of author IDs,not just a single value, to it.
The line pub_by_yr[i, ] <- c(scopus_author_data$au_id[i], year(ymd(scopus_author_data$full_data$df$prism:coverDate))) could cause an error because pub_by_yr is an empty data frame with no rows. Instead, you could use rbind to add a new row to the data frame. The line could be changed to:
pub_by_yr <- rbind(pub_by_yr, c(scopus_author_data$au_id[i], year(ymd(scopus_author_data$full_data$df$prism:coverDate
))))
After making these changes, the corrected script would be:
pub_by_yr <- data.frame(matrix(ncol = 2, nrow = 0))
colnames(pub_by_yr) <- c('scopus_id', 'pub_yr')
for (i in 1:4) {
print(i)
try(scopus_author_data <- author_data(last_name = author_names_1$LastName[i],
first_name = author_names_1$FirstName[i], headers = hdr))
if(inherits(scopus_author_data, "try-error")) {
next
}
author_names_1$scopus_id[i] <- scopus_author_data$au_id
author_names_1$scopus_first_name[i] <- scopus_author_data$first_name
author_names_1$scopus_last_name[i] <- scopus_author_data$last_name
author_names_1$scopus_first_pub_yr[i] <- min(year(ymd(scopus_author_data[["full_data"]][["df"]][["prism:coverDate"]])))
print(scopus_author_data$au_id)
scopus_id <- scopus_author_data$au_id
pub_by_yr <- rbind(pub_by_yr, c(scopus_author_data$au_id[i], year(ymd(scopus_author_data$full_data$df$`prism:coverDate`))))
}
Upvotes: 2