Reputation: 183
I face the following problem:
test_vector <- c('https://data.sec.gov/submissions/CIK0000789019.json', 'https://data.sec.gov/submissions/CIK0001652044.json',
'https://data.sec.gov/submissions/CIK0001018724.json')
test_df <- lapply(test_vector, jsonlite::fromJSON, flatten= TRUE) %>%
spread_all()
From the SEC webpage I try to get some basic information about companies. The JSON object has a mixed structure. Some variables are "single columns" whereas others are located in nested structures. There are a couple of problems that I can't solve:
When the value for tickers and exchange is not a scalar (in other words when a company has two tickers) I get NAs
I cant delete the last column ..JSON which turns to be a list
I played around with several options. None of them was successful.
Follow-up question:
When I try to use this approach in a loop:
cik_df <- data.frame
for (i in cik_vector) {
output <- lapply(cik_vector, jsonlite::fromJSON) %>%
spread_all
if (i > 1 & i %% 10 == 0) {
Sys.sleep(1)
}
cik_df <- rbind (cik_df, output)
}
I get the following error message:
Error in if (is.character(txt) && length(txt) == 1 && nchar(txt, type = "bytes") < : missing value where TRUE/FALSE needed
Upvotes: 0
Views: 245
Reputation: 48
First you import what you need like so,
library(tidyr)
library(stringr)
test_vector <- c('https://data.sec.gov/submissions/CIK0000789019.json',
'https://data.sec.gov/submissions/CIK0001652044.json',
'https://data.sec.gov/submissions/CIK0001018724.json')
test_df <- lapply(test_vector, jsonlite::fromJSON, flatten= TRUE)
You can use unlist()
to take each individual value out of each company's list and store it in a data frame. Each individual value has a variable name, and the company that it's associated with.
test_frame <- data.frame(name = c(rep("Microsoft", length(unlist(test_df[[1]]))),
rep("Alphabet", length(unlist(test_df[[2]]))),
rep("Amazon", length(unlist(test_df[[3]])))),
variable_name = c(names(unlist(test_df[[1]])),
names(unlist(test_df[[2]])),
names(unlist(test_df[[3]]))),
value = c(unlist(test_df[[1]]),
unlist(test_df[[2]]),
unlist(test_df[[3]])))
Taking Alphabet as an example, it has two ticker names. One has the variable name "tickers1" and the other is "tickers2". The other two companies have only one ticker, simply called "tickers". We want to rename "tickers" (and other variable names) to "tickers1" so that all the variable names are standard between the three companies.
test_frame$variable_name[is.na(as.numeric(str_sub(test_frame$variable_name, - 1, - 1)))] <- paste(test_frame$variable_name[is.na(as.numeric(str_sub(test_frame$variable_name, - 1, - 1)))], "1", sep = "")
Now we convert this data frame to one where the column names are the individual names so that it's easy to search through.
df <- test_frame %>%
pivot_wider(names_from = variable_name, values_from = value)
Upvotes: 1
Reputation: 183
Meanwhile i found a acceptable solution:
listviewer::jsonedit(raw_json, height = "1200px", mode = "view")
# scrape relevant data out of json files using a loop approach
company <- data_frame()
for (i in cik_vector) {
raw_json <- jsonlite::read_json(i)
master_data <- tibble(cik = pluck(raw_json, "cik"),
sic = pluck(raw_json, "sic"),
sic_desc = pluck(raw_json, "sicDescription"),
company_name = pluck(raw_json, "name"),
description = pluck(raw_json, "description"),
website = pluck(raw_json, "website"),
country1= pluck(raw_json, "stateOfIncorporation"),
country2= pluck(raw_json, "stateOfIncorporationDescription"))
tickers <- tibble(symbol = pluck(raw_json, "tickers", .default= NA)) %>%
mutate(symbol= paste(unique(symbol), collapse= ",")) %>%
unique()
exchanges <- tibble(exchanges = pluck(raw_json, "exchanges")) %>%
mutate(exchanges= paste(unique(exchanges), collapse= ",")) %>%
unique()
former_names <- tibble(former_names = pluck(raw_json, "formerNames", .default= NA)) %>%
mutate(former_names= paste(unique(former_names), collapse= ",")) %>%
unique()
business <- raw_json %>%
enter_object('business') %>%
spread_all %>%
data_frame() %>%
select(1:7)
output<- cbind(master_data, tickers, exchanges, former_names, business)
company <- rbind(company, output) %>% unique
}
The listviewer package was very a very helpful resource to explore the inner structure of the json file.
Upvotes: 0