Anisha Garg
Anisha Garg

Reputation: 115

Data set with rscopus

I'm new to rscopus, and am trying to find the number of publications by year for a list of authors, for whom I have extracted the author ids. However, I am not able to get this in a data frame format as intended. I am using the following code:

library(dplyr)
library(tidyr)
library(lubridate)
library(data.table)
library(rscopus)

set_api_key("MY_KEY")
hdr=inst_token_header("MY_TOKEN")
key=get_api_key()

#My data set is author_names_1

for (i in seq_along(1:10)) {
  print(i)
  
  try(scopus_author_data <- author_data(last_name = author_names_1$LastName[i], 
                                        first_name = author_names_1$FirstName[i], headers = hdr))
  if(inherits(scopus_author_data, "try-error")) {
    next
  }
  author_names_1$scopus_id[i] = scopus_author_data$au_id
  author_names_1$scopus_first_name[i] = scopus_author_data$first_name
  author_names_1$scopus_last_name[i] = scopus_author_data$last_name
  author_names_1$scopus_first_pub_yr[i] = min(year(ymd(scopus_author_data[["full_data"]][["df"]][["prism:coverDate"]])))
  
  scopus_id = scopus_author_data$au_id[i]
  
  try(pub_by_yr <- data.frame(scopus_id = scopus_author_data$au_id[i],
                    pub_yr = year(ymd(scopus_author_data[["full_data"]][i][["df"]][["prism:coverDate"]]))))
  if(inherits(pub_by_yr, "try-error")) {
    next
  }
}

Alternatively, I also tried:

pub_by_yr <- data.frame(matrix(ncol = 2, nrow = 0))
colnames(pub_by_yr) <- c('scopus_id', 'pub_yr')

for (i in seq_along(1:4)) {
  print(i)
  
  try(scopus_author_data <- author_data(last_name = author_names_1$LastName[i], 
                                        first_name = author_names_1$FirstName[i], headers = hdr))
  if(inherits(scopus_author_data, "try-error")) {
    next
  }
  author_names_1$scopus_id[i] = scopus_author_data$au_id
  author_names_1$scopus_first_name[i] = scopus_author_data$first_name
  author_names_1$scopus_last_name[i] = scopus_author_data$last_name
  author_names_1$scopus_first_pub_yr[i] = min(year(ymd(scopus_author_data[["full_data"]][["df"]][["prism:coverDate"]])))
  
  print(scopus_author_data$au_id)
  scopus_id[i] = scopus_author_data$au_id

  pub_by_yr[i, ] <- c(scopus_author_data$au_id[i], year(ymd(scopus_author_data$full_data$df$`prism:coverDate`)))
}

The motive here is to get a new data set with author ids (which I get from author_data()) along with the publication years. But what I am getting is publication years only for the first author id. Does anyone knows how to work around this?

I would really appreciate any help here.

Thank You So Much!

Upvotes: 0

Views: 86

Answers (1)

Scott Kelsey
Scott Kelsey

Reputation: 21

There may be a few issues with the first script:

  1. The try() function may not being used properly. The try() function should be used to wrap around the code that may produce an error, but in the provided script, it is being used after the code block.
  2. The variable scopus_id might not being used correctly to set the value of pub_by_yr$scopus_id, which may result in incorrect data.
  3. The data.frame() function might be being called inside the loop, which may not be an efficient approach to build a data frame.

Here's a potentially updated version of the script with these issues addressed:

library(dplyr)
library(tidyr)
library(lubridate)
library(data.table)
library(rscopus)

set_api_key("MY_KEY")
hdr = inst_token_header("MY_TOKEN")
key = get_api_key()

#My data set is author_names_1

for (i in seq_along(author_names_1$LastName)) {
  print(i)

  # use try() to catch errors
  scopus_author_data <- try(
    author_data(last_name = author_names_1$LastName[i], 
                first_name = author_names_1$FirstName[i], headers = hdr)
  )

  if(!inherits(scopus_author_data, "try-error")) {
    author_names_1$scopus_id[i] = scopus_author_data$au_id
    author_names_1$scopus_first_name[i] = scopus_author_data$first_name
    author_names_1$scopus_last_name[i] = scopus_author_data$last_name
    author_names_1$scopus_first_pub_yr[i] = min(year(ymd(scopus_author_data[["full_data"]][["df"]][["prism:coverDate"]])))

    # build a data frame with all publication years outside the loop
    pub_yr <- data.frame(
      scopus_id = scopus_author_data$au_id,
      pub_yr = year(ymd(scopus_author_data[["full_data"]][["df"]][["prism:coverDate"]]))
    )
  }
}

In this updated script, the try() function is being used to catch errors that may occur when calling author_data() function. The if() statement checks if the scopus_author_data object is not an error object and updates the author_names_1 data frame with the required values. The data.frame() function is called outside the loop to build a data frame with all publication years. This would be more efficient than building a data frame inside the loop.

There may be a few issues with the second script:

  1. The loop for (i in seq_along(1:4)) could be written as for (i in 1:4). The seq_along function creates a sequence of the same length as the argument, so seq_along(1:4) is equivalent to 1:4.

  2. The line scopus_id[i] = scopus_author_data$au_id could be changed to scopus_id <- scopus_author_data$au_id. This is because scopus_id is a vector and you might want to assign the entire vector of author IDs,not just a single value, to it.

  3. The line pub_by_yr[i, ] <- c(scopus_author_data$au_id[i], year(ymd(scopus_author_data$full_data$df$prism:coverDate))) could cause an error because pub_by_yr is an empty data frame with no rows. Instead, you could use rbind to add a new row to the data frame. The line could be changed to:

    pub_by_yr <- rbind(pub_by_yr, c(scopus_author_data$au_id[i], year(ymd(scopus_author_data$full_data$df$prism:coverDate))))

After making these changes, the corrected script would be:

pub_by_yr <- data.frame(matrix(ncol = 2, nrow = 0))
colnames(pub_by_yr) <- c('scopus_id', 'pub_yr')

for (i in 1:4) {
  print(i)
  
  try(scopus_author_data <- author_data(last_name = author_names_1$LastName[i], 
                                        first_name = author_names_1$FirstName[i], headers = hdr))
  if(inherits(scopus_author_data, "try-error")) {
    next
  }
  author_names_1$scopus_id[i] <- scopus_author_data$au_id
  author_names_1$scopus_first_name[i] <- scopus_author_data$first_name
  author_names_1$scopus_last_name[i] <- scopus_author_data$last_name
  author_names_1$scopus_first_pub_yr[i] <- min(year(ymd(scopus_author_data[["full_data"]][["df"]][["prism:coverDate"]])))

  print(scopus_author_data$au_id)
  scopus_id <- scopus_author_data$au_id

  pub_by_yr <- rbind(pub_by_yr, c(scopus_author_data$au_id[i], year(ymd(scopus_author_data$full_data$df$`prism:coverDate`))))
}

Upvotes: 2

Related Questions