Reputation: 5907
I found this reddit post here - https://www.reddit.com/r/obama/comments/xgsxy7/donald_trump_and_barack_obama_are_among_the/ .
I would like to use the API in such a way, such that I can get all the comments from this post.
I tried looking into the documentation of this API (e.g. https://github.com/pushshift/api) and this does not seem possible? If somehow I cold get the LINK_ID pertaining to this reddit post, I think I would be able to do it then.
Is this possible to do?
UPDATE: Can someone please show me how to do this in R?
Thanks!
library(jsonlite)
part1 = "https://api.pushshift.io/reddit/search/comment/?q=trump&after="
part2 = "h&before="
part3 = "h&size=500"
results = list()
for (i in 1:10)
{tryCatch({
{
url_i<- paste0(part1, i+1, part2, i, part3)
r_i <- fromJSON(url_i)
results[[i]] <- data.frame(r_i$data$body , r_i$data$id, r_i$data$parent_id, r_i$data$link_id)
#myvec_i <- sapply(results, NROW)
#print(c(i, sum(myvec_i)))
print(i)
#ifelse(i %% 200 == 0, saveRDS(results, "results_index.RDS"), "" )
}
}, error = function(e){})
}
final = do.call(rbind.data.frame, results)
Upvotes: 1
Views: 1414
Reputation: 100
This is how you can do it in R
# Import required library
library(jsonlite)
# Set API endpoint and parameters
part1 <- "https://api.pushshift.io/reddit/search/comment/?q=trump&after="
part2 <- "h&before="
part3 <- "h&size=500"
# Initialize empty list for storing results
results <- list()
# Loop through API requests
for (i in 1:10) {
# Construct API request URL
url_i <- paste0(part1, i+1, part2, i, part3)
# Send GET request to the API
r_i <- fromJSON(url_i)
# Extract data from API response and store in list
results[[i]] <- data.frame(body = r_i$data$body,
id = r_i$data$id,
parent_id = r_i$data$parent_id,
link_id = r_i$data$link_id)
# Print progress
cat("Request", i, "complete\n")
}
# Combine list of results into a single data frame
final <- do.call(rbind.data.frame, results)
Refactor You can also slightly refactor the code to
library(purrr)
library(httr)
# Set API endpoint and parameters
endpoint <- "https://api.pushshift.io/reddit/search/comment/"
params <- list(q = "trump", size = 500)
# Function to fetch data from API
fetch_data <- function(after, before) {
query <- list(after = after, before = before)
response <- GET(url = endpoint, query = c(params, query))
content(response)$data[, c("body", "id", "parent_id", "link_id")]
}
# Use map() to fetch data for multiple requests
results <- map(1:10, ~ fetch_data(.x+1, .x))
# Combine list of results into a single data frame
final <- do.call(rbind.data.frame, results)
In this code, we've used httr::content() to extract the relevant data from the API response instead of using jsonlite::fromJSON(). We've also used purrr::map() to fetch data for multiple requests instead of using a for loop. These changes should make the code more concise and easier to read.
Upvotes: 1
Reputation: 126
The Link Id of the post is in the URL https://www.reddit.com/r/obama/comments/xgsxy7 <-- id
You could even search https://www.reddit.com/xgsxy7 to get the information.
If you fetch at the endpoint https://www.reddit.com/xgsxy7.json you would get the JSON information, you should then access the object to find them.
JS example:
const data = fetchedJSONObject;
const comments = data[1].data.children.map(comment => comment.data.body); // to get the text body
And you can just analyze the JSON object and get all the data you want from it: if the comment has some nested replies to it, time created, author, etc.
Upvotes: 4