screechOwl
screechOwl

Reputation: 28169

R function for a vector

I have a data frame and I'm trying to take a factor variable and keep only the top 31 levels and make all the other levels some generic level.

I need to do this across several vectors so I figured I'd create function, but I'm not having much luck. I think I need to somehow use mapply or Vectorize but I don't think I'm doing it properly as I get error messages about being unable to allocate 3.6 gigs of memory.

This is the function where x is the vector and topCount is the number of levels to keep

createFactor <-function(x, topCount){
    table1 <- data.frame(table(x))
    table1 <- table1[order(-table1$Freq),]
    noChange <- table1$Var1[1:topCount]
    newVals1 <- factor(ifelse(x %in% noChange, x, "-1000"))
    newVals1
}

I'd like to be able to write something like this:

df1$topLevels <- createFactor(df1$fact1, 31)

Any suggestions ?

Upvotes: 0

Views: 1414

Answers (1)

joran
joran

Reputation: 173717

I'm not completely certain about the performance characteristics of this, but I probably would have written this function more like so:

topK <- function(x,k){
    tbl <- tabulate(x)
    names(tbl) <- levels(x)
    x <- as.character(x)
    levelsToKeep <- names(tail(sort(tbl),k))
    x[!(x %in% levelsToKeep)] <- '-1000'
    factor(x)
}

where I've used tabulate rather than table because I suspect is may be faster (which seems important in your case) although I haven't tested this to see how much faster it would actually be.

Upvotes: 3

Related Questions