Philipp Schulz
Philipp Schulz

Reputation: 183

How do I store values of .json into R dataframe?

I face the following problem:

test_vector <- c('https://data.sec.gov/submissions/CIK0000789019.json', 'https://data.sec.gov/submissions/CIK0001652044.json',
                  'https://data.sec.gov/submissions/CIK0001018724.json')
  
test_df <- lapply(test_vector, jsonlite::fromJSON, flatten= TRUE) %>%
  spread_all()

From the SEC webpage I try to get some basic information about companies. The JSON object has a mixed structure. Some variables are "single columns" whereas others are located in nested structures. There are a couple of problems that I can't solve:

  1. When the value for tickers and exchange is not a scalar (in other words when a company has two tickers) I get NAs

  2. I cant delete the last column ..JSON which turns to be a list

I played around with several options. None of them was successful.

Follow-up question:

When I try to use this approach in a loop:

cik_df <- data.frame

for (i in cik_vector) {
  
  output <- lapply(cik_vector, jsonlite::fromJSON) %>%
    spread_all
  
  if (i > 1 & i %% 10 == 0) {
    Sys.sleep(1)
  }
  
  cik_df <- rbind (cik_df, output) 
}

I get the following error message:

Error in if (is.character(txt) && length(txt) == 1 && nchar(txt, type = "bytes") < : missing value where TRUE/FALSE needed

Upvotes: 0

Views: 245

Answers (2)

Ben Smith
Ben Smith

Reputation: 48

First you import what you need like so,

library(tidyr)
library(stringr)

test_vector <- c('https://data.sec.gov/submissions/CIK0000789019.json',
                 'https://data.sec.gov/submissions/CIK0001652044.json',
                 'https://data.sec.gov/submissions/CIK0001018724.json')

test_df <- lapply(test_vector, jsonlite::fromJSON, flatten= TRUE)

You can use unlist() to take each individual value out of each company's list and store it in a data frame. Each individual value has a variable name, and the company that it's associated with.

test_frame <- data.frame(name = c(rep("Microsoft", length(unlist(test_df[[1]]))), 
                                  rep("Alphabet", length(unlist(test_df[[2]]))),
                                  rep("Amazon", length(unlist(test_df[[3]])))),

                         variable_name = c(names(unlist(test_df[[1]])),
                                           names(unlist(test_df[[2]])),
                                           names(unlist(test_df[[3]]))),
                     
                         value = c(unlist(test_df[[1]]), 
                                   unlist(test_df[[2]]),
                                   unlist(test_df[[3]])))

Taking Alphabet as an example, it has two ticker names. One has the variable name "tickers1" and the other is "tickers2". The other two companies have only one ticker, simply called "tickers". We want to rename "tickers" (and other variable names) to "tickers1" so that all the variable names are standard between the three companies.

test_frame$variable_name[is.na(as.numeric(str_sub(test_frame$variable_name, - 1, - 1)))] <- paste(test_frame$variable_name[is.na(as.numeric(str_sub(test_frame$variable_name, - 1, - 1)))], "1", sep = "")

Now we convert this data frame to one where the column names are the individual names so that it's easy to search through.

df <- test_frame %>%
  pivot_wider(names_from = variable_name, values_from = value)

Upvotes: 1

Philipp Schulz
Philipp Schulz

Reputation: 183

Meanwhile i found a acceptable solution:

listviewer::jsonedit(raw_json, height = "1200px", mode = "view")


# scrape relevant data out of json files using a loop approach

company <- data_frame()


for (i in cik_vector) {
  
  raw_json <- jsonlite::read_json(i)
    
  master_data <- tibble(cik = pluck(raw_json, "cik"),
                        sic = pluck(raw_json, "sic"),
                        sic_desc = pluck(raw_json, "sicDescription"),
                        company_name = pluck(raw_json, "name"),
                        description = pluck(raw_json, "description"),
                        website = pluck(raw_json, "website"),
                        country1= pluck(raw_json, "stateOfIncorporation"),
                        country2= pluck(raw_json, "stateOfIncorporationDescription"))
  
  tickers <- tibble(symbol = pluck(raw_json, "tickers", .default= NA)) %>% 
    mutate(symbol= paste(unique(symbol), collapse= ",")) %>%
    unique()
  
  exchanges <- tibble(exchanges = pluck(raw_json, "exchanges")) %>% 
    mutate(exchanges= paste(unique(exchanges), collapse= ",")) %>%
    unique()
  
  former_names <- tibble(former_names = pluck(raw_json, "formerNames", .default= NA)) %>% 
    mutate(former_names= paste(unique(former_names), collapse= ",")) %>%
    unique()
  
  business <- raw_json %>%
    enter_object('business') %>%
    spread_all %>%
    data_frame() %>%
    select(1:7)
  
  output<- cbind(master_data, tickers, exchanges, former_names, business)
  
  company <- rbind(company, output) %>% unique
  
}

The listviewer package was very a very helpful resource to explore the inner structure of the json file.

Upvotes: 0

Related Questions