Reputation: 416
I have a data frame with observations on YouTube video_ids
. When used in an API call, these ids allow me to fetch data on certain videos that I use to enrich my dataset.
First I created a list of unique video_ids with the below script. This returns a large list of 6350 unique elements.
video_ids <- list();
index <- 1
for(i in unique(df$video_id)){
video_ids[[index]] <- list(
video_id = i
)
index <- index + 1
}
The API documentation asks for a comma seperated list of video ids. I did that by using unlist(video_ids)
which returns a large vector. I cannot use this vector in the API call, because it is way too long.
The maximum amount of ids I can process in one API call is 50.
library(httr)
api_key = "xxxx"
process_ids = unlist(video_ids[1:50]) #pass the first 50 elements of the video_ids list
url <- modify_url("https://www.googleapis.com/youtube/v3/videos",
query = list(
"part" = "snippet",
"id" = paste(process_ids, collapse=","),
"key" = api_key)
)
output <- content(GET(url), as = "parsed", type = "application/json")
What is the best approach for this in R? Can I loop through my list of 6350 elements by 50 items each loop, removing these items from the list when the loop completes?
My current script below loops through each video id in the list and fetches the data I need from the output of the API response. This works, but is very slow and requires a lot of loops / API calls. (6350 loops). It can't be the most effient way to approach this.
result <- list();
index <- 1
for (id in video_ids) {
api_key = "xxxx"
url <- modify_url("https://www.googleapis.com/youtube/v3/videos",
query = list(
"part" = "snippet",
"id" = paste(id, collapse=","),
"key" = api_key)
)
output <- content(GET(url), as = "parsed", type = "application/json")
#Adds what I need from the output to a list called result
for(t in output$items){
result[[index]] <- list(
video_id = t$id,
channel_id = t$snippet$channelId
)
}
index <- index + 1
}
Upvotes: 0
Views: 75
Reputation: 389065
You can try the following :
Split the video id's every 50 values and pass it to the API.
vec = unlist(video_ids)
result <- lapply(split(vec, ceiling(seq_along(vec)/50)), function(x) {
url <- modify_url("https://www.googleapis.com/youtube/v3/videos",
query = list(
"part" = "snippet",
"id" = paste(x, collapse=","),
"key" = api_key))
content(GET(url), as = "parsed", type = "application/json")
})
Upvotes: 1