Friederike
Friederike

Reputation: 31

Split data by row in R in quantiles

I have a data.frame called fd with 406 rows and 48 columns. For each row in fd I want to compute ntiles (sixtiles). I do this the following way:

quant <- apply(fd, 1,  function(x) quantile(t(x), probs = c(1/6, 2/6, 0.5, 4/6, 5/6), na.rm = TRUE ))

What I now want to do, is split my original data into 6 new dataframes, i.e. fd1 to fd6, where in fd1 I have all the observations of the first sixtile, in fd2 I have all the observations of the second sixtile and so on. Again, I want to do this rowwise. Meaning, I want my algorithm/function to look at each row of fd and do the following:

Take all the observations of the first sixtile, in the first row of fd, and store them into the first row of fd1, then take the first sixtile of the second row of fd and store them in the second row of fd1.

Important to note: I do not have observations for each row and column, so in some I have missing data (NA)

Could anybody give hints on how I can achieve this? Thanks in advance.

Upvotes: 1

Views: 568

Answers (2)

Vitali Avagyan
Vitali Avagyan

Reputation: 1203

Here is a relatively shorter way of achieving this using purrr and dplyr packages:

library(dplyr)
library(purrr)

# some random example
df <- data.frame(matrix(runif(48),405,48))
df[3,5] <- NA
df[10,25:26] <- NA

quant <- apply(df, 1,  function(x) aa <- quantile(t(x), probs = c(1/6, 2/6, 3/6, 4/6, 5/6), na.rm = TRUE ))

aa <- as.data.frame(t(df))
fd1 <- map2(quant[1,],aa,function(x,y) y[y <= x] %>% .[!is.na(.)]) %>% 
  do.call(rbind,.)%>% as.data.frame(.)
fd2 <- pmap(list(quant[1,],quant[2,],aa),function(x,y,z) z[z > x & z <= y] %>% .[!is.na(.)]) %>% 
  do.call(rbind,.) %>% as.data.frame(.)
fd3 <- pmap(list(quant[2,],quant[3,],aa),function(x,y,z) z[z > x & z <= y] %>% .[!is.na(.)]) %>% 
  do.call(rbind,.) %>% as.data.frame(.)
fd4 <- pmap(list(quant[3,],quant[4,],aa),function(x,y,z) z[z > x & z <= y] %>% .[!is.na(.)]) %>% 
  do.call(rbind,.) %>% as.data.frame(.)
fd5 <- pmap(list(quant[4,],quant[5,],aa),function(x,y,z) z[z > x & z <= y] %>% .[!is.na(.)]) %>% 
  do.call(rbind,.) %>% as.data.frame(.)
fd6 <- map2(quant[5,],aa,function(x,y) y[y > x & y <= max(y)] %>% .[!is.na(.)]) %>% 
  do.call(rbind,.) %>% as.data.frame(.)

NB: There are some duplicate values in the final fd1 - fd6 data frames (which is not, by the way, the best format to store values for this type of problem) but you can always filter them out by using for example unique.

Hope this helps. Any modification to the answer is welcomed.

Upvotes: 0

Jan Marvin
Jan Marvin

Reputation: 969

Old school solution using matrix, list and nested loops.

# some artifical data with missings
set.seed(123)
fd <- data.frame(matrix(rnorm(406*48), nrow = 406, ncol = 48))
diag(fd) <- NA

# quant
quant <- apply(fd, 1,  function(x)
  quantile(t(x), probs = (0:6)/6, na.rm = TRUE, type = 6)
)

#matrix with selection
res <- list()
for (i in 1:6) {
  mm <- matrix(NA, nrow = nrow(fd), ncol = ncol(fd)/6)
  for (j in 1:nrow(fd)) {
    lwr <- (quant[(i),j] < fd[j,])
    upr <- (fd[j,] <= quant[(i+1),j])
    if (i == 1)
      z_j <- fd[j,][ upr ]
    else
      z_j <- fd[j,][ lwr & upr ]

    z_j <- z_j[!is.na(z_j)]
    mm[j,1:length(z_j)] <- sort(z_j)
  }
  res[[i]] <- mm
}
rm(i, mm, j, lwr, upr)

fd1 <- res[[1]]

Upvotes: 1

Related Questions