Reputation: 31
I have a data.frame called fd with 406 rows and 48 columns. For each row in fd I want to compute ntiles (sixtiles). I do this the following way:
quant <- apply(fd, 1, function(x) quantile(t(x), probs = c(1/6, 2/6, 0.5, 4/6, 5/6), na.rm = TRUE ))
What I now want to do, is split my original data into 6 new dataframes, i.e. fd1 to fd6, where in fd1 I have all the observations of the first sixtile, in fd2 I have all the observations of the second sixtile and so on. Again, I want to do this rowwise. Meaning, I want my algorithm/function to look at each row of fd and do the following:
Take all the observations of the first sixtile, in the first row of fd, and store them into the first row of fd1, then take the first sixtile of the second row of fd and store them in the second row of fd1.
Important to note: I do not have observations for each row and column, so in some I have missing data (NA)
Could anybody give hints on how I can achieve this? Thanks in advance.
Upvotes: 1
Views: 568
Reputation: 1203
Here is a relatively shorter way of achieving this using purrr
and dplyr
packages:
library(dplyr)
library(purrr)
# some random example
df <- data.frame(matrix(runif(48),405,48))
df[3,5] <- NA
df[10,25:26] <- NA
quant <- apply(df, 1, function(x) aa <- quantile(t(x), probs = c(1/6, 2/6, 3/6, 4/6, 5/6), na.rm = TRUE ))
aa <- as.data.frame(t(df))
fd1 <- map2(quant[1,],aa,function(x,y) y[y <= x] %>% .[!is.na(.)]) %>%
do.call(rbind,.)%>% as.data.frame(.)
fd2 <- pmap(list(quant[1,],quant[2,],aa),function(x,y,z) z[z > x & z <= y] %>% .[!is.na(.)]) %>%
do.call(rbind,.) %>% as.data.frame(.)
fd3 <- pmap(list(quant[2,],quant[3,],aa),function(x,y,z) z[z > x & z <= y] %>% .[!is.na(.)]) %>%
do.call(rbind,.) %>% as.data.frame(.)
fd4 <- pmap(list(quant[3,],quant[4,],aa),function(x,y,z) z[z > x & z <= y] %>% .[!is.na(.)]) %>%
do.call(rbind,.) %>% as.data.frame(.)
fd5 <- pmap(list(quant[4,],quant[5,],aa),function(x,y,z) z[z > x & z <= y] %>% .[!is.na(.)]) %>%
do.call(rbind,.) %>% as.data.frame(.)
fd6 <- map2(quant[5,],aa,function(x,y) y[y > x & y <= max(y)] %>% .[!is.na(.)]) %>%
do.call(rbind,.) %>% as.data.frame(.)
NB: There are some duplicate values in the final fd1 - fd6
data frames (which is not, by the way, the best format to store values for this type of problem) but you can always filter them out by using for example unique
.
Hope this helps. Any modification to the answer is welcomed.
Upvotes: 0
Reputation: 969
Old school solution using matrix, list and nested loops.
# some artifical data with missings
set.seed(123)
fd <- data.frame(matrix(rnorm(406*48), nrow = 406, ncol = 48))
diag(fd) <- NA
# quant
quant <- apply(fd, 1, function(x)
quantile(t(x), probs = (0:6)/6, na.rm = TRUE, type = 6)
)
#matrix with selection
res <- list()
for (i in 1:6) {
mm <- matrix(NA, nrow = nrow(fd), ncol = ncol(fd)/6)
for (j in 1:nrow(fd)) {
lwr <- (quant[(i),j] < fd[j,])
upr <- (fd[j,] <= quant[(i+1),j])
if (i == 1)
z_j <- fd[j,][ upr ]
else
z_j <- fd[j,][ lwr & upr ]
z_j <- z_j[!is.na(z_j)]
mm[j,1:length(z_j)] <- sort(z_j)
}
res[[i]] <- mm
}
rm(i, mm, j, lwr, upr)
fd1 <- res[[1]]
Upvotes: 1