Wiam Nasr
Wiam Nasr

Reputation: 43

JSON applied over a dataframe in R

I used the below on one website and it returned a perfect result:

looking for key word: Emaar pasted at the end of the query:

library(httr)
library(jsonlite)



query<-"https://www.googleapis.com/customsearch/v1?key=AIzaSyA0KdZHRkAjmoxKL14eEXp2vnI4Yg_po38&cx=006431301429107149113:as7yqcm2qc8&q=Emaar"

result11 <- content(GET(query))
print(result11)
result11_JSON <- toJSON(result11)
result11_JSON <- fromJSON(result11_JSON)
result11_df <- as.data.frame(result11_JSON)

now I want to apply the same function over a data.frame containing key words:

so i did the below testing .csv file:

     Company Name
[1]  ADES International Holding Ltd
[2]  Emirates REIT (CEIC) Limited
[3]  POLARCUS LIMITED

called it Testing Website Extraction.csv

code used:

test_companies <- read.csv("... \\Testing Website Extraction.csv")

#removing space and adding "+" sign then pasting query before it (query already has my unique google key and search engine ID
test_companies$plus <- gsub(" ", "+", test_companies$Company.Name)


query <- "https://www.googleapis.com/customsearch/v1?key=AIzaSyCmD6FRaonSmZWrjwX6JJgYMfDSwlR1z0Y&cx=006431301429107149113:as7yqcm2qc8&q="

test_companies$plus <- paste0(query, test_companies$plus)

a <- test_companies$plus
length(a)
function_webs_search <- function(web_search) {content(GET(web_search))}



result <- lapply(as.character(a), function_webs_search)

Result here shows a list of length 3 (the 3 search terms) and sublist within each term containing: url (list[2]), queries (list[2]), ... items (list[10]) and these are the same for each search term (same length separately), my issue here is applying the remainder of the code

#when i run:
result_JSON <- toJSON(result)
result_JSON <- as.list(fromJSON(result_JSON))

I get a list of 6 list that has sublists

and putting it into a tidy dataframe where the results are listed under each other (not separately) is proving to be difficult

also note that I tried taking from the "result" list that has 3 separate lists in it each one by itself but its a lot of manual labor if I have a longer list of keywords

The expected end result should include 30 observations of 37 variables (for each search term 10 observations of 37 variables and all are underneath each other.

Things I have tried unsuccessfully:

These work to flatten the list:
#do.call(c , result)
#all.equal(listofvectors, res, check.attributes = FALSE)
#unlist(result, recursive = FALSE)
# for (i in 1:length(result))  {listofvectors <- c(listofvectors, result[[i]])}
#rbind()
#rbind.fill()

even after flattening I dont know how to organize them into a tidy final output for a non-R user to interact with.

Any help here would be greatly appreciated,

I am here in case anything is not clear about my question,

Always happy to learn more about R so please bear with me as I am just starting to catch up.

All the best and thanks in advance!

Upvotes: 1

Views: 51

Answers (1)

Wiam Nasr
Wiam Nasr

Reputation: 43

Basically what I did is extract only the columns I need from the dataframe list, below is the final code:

library(httr)
library(jsonlite)
library(tidyr)
library(stringr)
library(purrr)
library(plyr)


test_companies <- read.csv("c:\\users\\... Companies Without Websites List.csv")

test_companies$plus <- gsub(" ", "+", test_companies$Company.Name)


query <- "https://www.googleapis.com/customsearch/v1?key=AIzaSyCmD6FRaonSmZWrjwX6JJgYMfDSwlR1z0Y&cx=006431301429107149113:as7yqcm2qc8&q="

test_companies$plus <- paste0(query, test_companies$plus)

a <- test_companies$plus
length(a)
function_webs_search <- function(web_search) {content(GET(web_search))}



result <- lapply(as.character(a), function_webs_search)

function_toJSONall <- function(all) {toJSON(all)}

a <- lapply(result, function_toJSONall)


function_fromJSONall <- function(all) {fromJSON(all)}

b <- lapply(a, function_fromJSONall)


function_dataframe <- function(all) {as.data.frame(all)}

c <- lapply(b, function_dataframe)

function_column <- function(all) {all[ ,15:30]}

result_final <- lapply(c, function_column)

results_df <- rbind.fill(c[])

Upvotes: 1

Related Questions