Yunlong Huang
Yunlong Huang

Reputation: 85

R List subsetting using "[[" shows error subscript out of bounds

I was trying to subset a large list with 278226 elements and each element (shown as below) is also a list which has a number(between 39 and 50) of sub-elements(size 1 atomic vector with different names).

> str(listings_England[9922])
List of 1
 $ listing:List of 40
  ..$ agent_address       : chr "35 John Street, Luton"
  ..$ agent_logo          : chr "https://st.zoocdn.com/zoopla_static_agent_logo_(257607).png"
  ..$ agent_name          : chr "Ashton Carter Homes"
  ..$ agent_phone         : chr "020 8115 4543"
  ..$ category            : chr "Residential"
  ..$ country             : NULL
  ..$ country_code        : chr "gb"
  ..$ county              : NULL
  ..$ displayable_address : chr "Hatters Way Luton, Luton LU1"
  ..$ first_published_date: chr "2017-11-16 17:25:36"
  ..$ last_published_date : chr "2018-01-29 18:40:52"
  ..$ latitude            : chr "51.88188"
  ..$ listing_id          : chr "39336869"
  ..$ listing_status      : chr "sale"
  ..$ longitude           : chr "-0.43237194"

Then I extract sub-elements such as "listing_id" as below:

> id1 <- sapply(listings_England, "[[", "listing_id")
Error in FUN(X[[i]], ...) : subscript out of bounds
> id3 <- sapply(listings_England[1:100000], "[[", "listing_id")
Error in FUN(X[[i]], ...) : subscript out of bounds
> id2 <- sapply(listings_England[1:50000], "[[", "listing_id")
> 

> listings_England$listing_id
NULL
> 

As you can see, it only works for the last one (same problem for the purrr::map family functions). I was wondering if it the limitation of these functions. And my current solution is:

id <- sapply(listings_England, function(x) x["listing_id"]) %>% as.numeric()

The problem here is "[[" or "$" function is not working for this large list, and only "[" works.

Upvotes: 2

Views: 4532

Answers (4)

Yunlong Huang
Yunlong Huang

Reputation: 85

This is the “Missing/out of bounds indices" problem, [ and [[ differ slightly in their behaviour when the index is out of bounds (OOB). Details can be found in the "Advanced R" book section 4.3.3 (the following link) [https://adv-r.hadley.nz/subsetting.html#subsetting-operators]

Upvotes: 0

IRTFM
IRTFM

Reputation: 263499

You have what I would call a "nested list". You can see from the str output that there is only one item at the top of your "element tree". Try this:

id1 <- sapply(listings_England[[1]], "[[", "listing_id")

It then extracts the first item (which has all of the content) and works on the resulting list. Could also use the equivalent operation:

id1 <- sapply(listings_England$listing, "[[", "listing_id")

Upvotes: 0

Parfait
Parfait

Reputation: 107767

As @JesseTweedle comments, your issue is a data-specific one. Somewhere in your data object listing_id does not exist as a named element and hence errs out. Consider wrapping your sapply function in a tryCatch to return NAs for those elements without listing_id with either [[ or $:

id2 <- sapply(listings_England[1:100000], function(x) 
                 tryCatch(x[["listing_id"]],
                          warning = function(w) return(NA),
                          error = function(e) return(NA)
                 )
       ) 

Additionally, per your post it looks like you have a nested structure with a named listing. Try this:

id2 <- sapply(listings_England[1:100000], function(x) 
                 tryCatch(x$listing$listing_id,
                          warning = function(w) return(NA),
                          error = function(e) return(NA)
                 )
       ) 

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522817

If you want to convert the listing_id entry to numeric, just use as.numeric directly:

listings_England$listing_id <- as.numeric(listings_England$listing_id)

sapply is what you would use if you wanted to apply a function to each element across a vector. But since as.numeric is already vectorized, you don't need an apply function in this case.

Upvotes: 0

Related Questions