R function to partition data set

Question

can somebody help with debugging a function. It is meant to do

dat3 <- c(4,7,5,7,8,4,4,4,4,4,4,7,4,4,8,8,5,5,5,5)

myfunc(dat3, chunksize = 8)
##  [1] 4 7 5 8 4 4 4 4   4 7 5 8 4 4 5 5   4

partition the data in chunks of a sizer and make sure that every level is present in every chunk. The function works for the toy example

myfunc <- function(x, chunksize = 8) {
    numChunks <- ceiling(length(x) / chunksize)
    uniqx <- unique(x)
    lastChunkSize <- chunksize * (1 - numChunks) + length(x)
    ## check to see if it is mathematically possible
    if (length(uniqx) > chunksize)
        stop('more factors than can fit in one chunk')
    if (any(table(x) < numChunks))
        stop('not enough of at least one factor to cover all chunks')
    if (lastChunkSize < length(uniqx))
        stop('last chunk will not have all factors')
    ## actually arrange things in one feasible permutation
    allIndices <- sapply(uniqx, function(z) which(z == x))
    ## fill one of each unique x into chunks
    chunks <- lapply(1:numChunks, function(i) sapply(allIndices, `[`, i))
    remainder <- unlist(sapply(allIndices, tail, n = -3))
    remainderCut <- split(remainder, ceiling(seq_along(remainder)/4))
    ## combine them all together, wary of empty lists
    finalIndices <- sapply(1:numChunks,
           function(i) {
               if (i <= length(remainderCut))
                   c(chunks[[i]], remainderCut[[i]])
               else
                   chunks[[i]]
           })
           save(finalIndices,file="finalIndices")
    x[unlist(finalIndices)]

}

the problem is that I want to get the rearranged indixes from the function (so what is called here final Indices). The problem is that for my real data set with more observations (https://www.dropbox.com/s/n3wc5qxaoavr4ta/j.RData?dl=0), the function does not work.

The data as factor https://www.dropbox.com/s/0ue2xzv5e6h858q/t.RData?dl=0

I change the chunkszie paramter according to the number of levels present to 9847 I in the first line of the function). The problem is that when I access finalIndices from the saved file, I get a matrix with dim 137 60. Which does not provide an index for all my observations (nearly 600k). Could somebody tell me what am i doing wrong? I know that 60 is the number of chunks (nrows/chunksize) but 137 appear not to fit.

R function to partition data set

Answers (1)

Related Questions