Reputation: 181
I'm experimenting with the quantile function in independent dataframes.
A very easy example to illustrate my case:
quantile(x <- rnorm(1001))
0% 25% 50% 75% 100%
-2.930587810 -0.687108751 0.004405246 0.644589258 2.839597566
#subdivide quantile results in 5 independent results (data frames) For example:
list2env(setNames(as.list(quantile(x <- rnorm(1001))),paste0("Q",1:5)),.GlobalEnv)
So now, in a new column I have next to the quartile data results, grouped into its corresponding quartile number Q0,Q1,Q2,Q3,Q4.
Now I'd like to apply the same to a "Large list" (large_list) with more than 400 elements on it, so I guess I need a different approach on it (function), to apply it globally into the 400 elements of my list.
Here I'd need the help of community, this is my approach:
#Read all elements of the list in the environment,create a new column to be named,
# Elementname.Quartilenumber that contains which
# Q (0,1,2,3,4) number the data belongs to.
Qnumber <- function(x) {
element_name <- stringi::stri_extract(names(x)[1], regex = "^[A-Z]+")
element_name <- paste0(element_name, ".Quartilenumber")
column_names <- c(names(x), stock_name)
x$quartile <- quantile(large_list$.)
x <- setNames(x, column_names)
return(x)
Any help will be very appreciated.
Thank you very much.
Upvotes: 2
Views: 1110
Reputation: 1392
For each element in your list, do the following:
calculate the quantiles, as you have done: qx <- quantiles(x)
count how many of these values are >=
each datum sum(qx >=
x[i])
; this corresponds to the quartile number in all but one
case—the maximum value (you get NA
for this one, because the sum
is 0)
set the quartile for the maximum value's quartile to the 4th quartile ('Q4').
Here are some fake data (a list of data frames):
list.1 <- list()
for (i in 1:5) {
list.1[[i]] <- data.frame('elem_data'=rnorm(10))
}
Step through the list of data.frames and add the quartile column.
qnames <- c('Q1','Q2','Q3','Q4')
for (i in 1:5) {
qx <- quantile(list.1[[i]]$elem_data)
list.1[[i]]$qnum <- sapply(list.1[[i]]$elem_data, function(x) qnames[sum(x >= qx)])
list.1[[i]]$qnum[is.na(list.1[[i]]$qnum)] <- qnames[4]
}
I tried this with a list of 1000 data.frames with 1000 data elements each, and it took about 2.5 seconds (on a mid-2013 MacBook Air).
Upvotes: 2