Dirk Calloway
Dirk Calloway

Reputation: 2619

Rename factor levels based on a condition in R

I want to combine all factors with a count less than n into one factor named "Else"

For example if n = 3 then in the following df I want to combine "c", "d" and "e" as "Else":

df = data.frame(x=c(1:10), y=c("a","a","a","b","b","b","c","d","d","e"))

I started out by getting a df with all the low count values:

library(plyr)
lowcounts = ddply(df, "y", function(z){if(nrow(z)<3) nrow(z) else NULL})

I know I could change these manually but in practice I have dozens of levels so I need to automate this.

I want to select and rename only the levels %in% lowcount in levels(df) and leave the rest the same but not sure how to proceed.

Upvotes: 2

Views: 1578

Answers (2)

alexis_laz
alexis_laz

Reputation: 13122

Another alternative:

#your dataframe
df = data.frame(x=c(1:10), y=c("a","a","a","b","b","b","c","d","d","e"))

#which levels to keep and which to change
res <- table(df$y)
notkeep <- names(res[res < 3])
keep <- names(res)[!names(res) %in% notkeep]
names(keep) <- keep

#set new levels
levels(df$y) <- c(keep, list("else" = notkeep))

df
#    x    y
#1   1    a
#2   2    a
#3   3    a
#4   4    b
#5   5    b
#6   6    b
#7   7 else
#8   8 else
#9   9 else
#10 10 else

Upvotes: 3

TheComeOnMan
TheComeOnMan

Reputation: 12875

Why not something like this?

library(data.table)
dt <- data.table(df)
dt[,ynew := ifelse(.N < 3, "else",as.character(y)), by = "y"]

Upvotes: 2

Related Questions