user8270077
user8270077

Reputation: 5081

How to convert list of lists to dataframe in R

I am using quantmod to download option chains which come in the form of nested lists.

However for my purposes I would rather have the information in the form of a dataframe where the name of each list is contained in a column of the dataframe (thus two columns would be needed one containing the date of the strike of the option and the second the type of the option -- call or put).

How can this be accomplished in R?

For a reproducible example:

library(quantmod)
AAPL.2015 <- getOptionChain("AAPL", "2019/2021")

enter image description here

And if possible, what I should do to get the dates of the option strikes in English?

Upvotes: 2

Views: 1430

Answers (3)

Parfait
Parfait

Reputation: 107767

Consider Map to rbind the individual calls and puts data frames adding needed indicator columns. Because the individual data frames resides within a nested, named list, the extract function, [ is used.

Then, run a final do.call + rbind across resulting list of data frames. NOTE: rbind assumes the call and put data frames maintain exact same names and number of columns.

call_put_func <- function(nm, call_df, put_df) {      
     cbind(rbind(transform(call_df, option_type = "call"),
                 transform(put_df, option_type = "put")
           ), date_of_strike = nm)
}

APPL_flat_df_list <- Map(call_put_func, nm = names(AAPL.2015), 
                                        call_df = lapply(AAPL.2015, "[[", "calls"), 
                                        put_df = lapply(AAPL.2015, "[[", "puts")
                        )

APPL_df <- do.call(rbind, unname(APPL_flat_df_list))

Upvotes: 1

camille
camille

Reputation: 16871

It might be more flexible to work with a couple functions from dplyr and purrr. dplyr::bind_rows can take a list of data frames, and they can have different names, whereas the base rbind just works on 2 data frames at once. bind_rows also has an argument .id that will create a column of list item names. purrr::map_dfr calls a function over a list and returns a data frame of them all row-bound together; because it wraps around bind_rows, it also has an .id argument.

Having access to setting those IDs is helpful because you have 2 sets of IDs: one of dates, and one of calls vs puts. Setting one ID within the inner bind_rows and one within map_dfr gets both.

Written out with a function, to make it a little easier to see:

library(quantmod)
AAPL.2015 <- getOptionChain("AAPL", "2019/2021")

aapl_df <- purrr::map_dfr(AAPL.2015, function(d) {
  dplyr::bind_rows(d, .id = "type")
  }, .id = "date")

head(aapl_df)
#>          date  type Strike  Last   Chg   Bid   Ask Vol OI
#> 1 Sep.27.2019 calls    140 79.50  9.50 77.60 77.90  10 30
#> 2 Sep.27.2019 calls    145 75.85  0.00 72.70 73.30  NA 28
#> 3 Sep.27.2019 calls    150 72.22  0.00 67.85 67.90  10 91
#> 4 Sep.27.2019 calls    155 52.53  0.00 65.80 69.90  NA 10
#> 5 Sep.27.2019 calls    160 60.10  0.00 57.85 58.15   2 11
#> 6 Sep.27.2019 calls    165 54.40 15.95 52.65 52.90   9 16

Or in more common dplyr piping with function shorthand notation:

library(dplyr)
aapl_df <- AAPL.2015 %>%
  purrr::map_dfr(~bind_rows(., .id = "type"), .id = "date")

Upvotes: 1

Pedro Cavalcante
Pedro Cavalcante

Reputation: 444

I couldn't reproduce your example, but what you're trying to do is simple. You could use do.call to call the rbind function on the list and what you get at the end is a pretty dataframe.

list <- getOptionChain("AAPL", "2019/2021")

data <- do.call(rbind, list)

Upvotes: 2

Related Questions