Waldir Leoncio
Waldir Leoncio

Reputation: 11341

How can I drop unused levels from a data frame?

Given the following mock data:

set.seed(123)
x <- data.frame(let = sample(letters[1:5], 100, replace = T), 
                num = sample(1:10, 100, replace = T))
y <- subset(x, let != 'a')

Creating a table of y$let yields

a  b  c  d  e 
0 20 21 22 18

But I don't want a to show anymore. If I try to do this:

levels(y$let) <- factor(y$let)

I mess the frequencies, since now table(y$let) gives me

b  d  c  e 
0 20 21 40 

I'm aware I could do xtabs(~ y$let, drop.unused.levels = T) and work around the problem, but it doesn't reset the variable levels at its core (which is important to me, since this is an early change I'm making to the dataset which will carry on throughout the whole analysis). Moreover, xtabs is a different class from table, which will give me headaches later in the project.

The question is: how can I automatically change levels(y$let) so it doesn't show levels that were dropped when I created the subset? In this case, how can I make it show [1] "b" "c" "d" "e"?

Upvotes: 53

Views: 85495

Answers (4)

Linda Marsh
Linda Marsh

Reputation: 105

The forcats package for working with factors is often a good choice.

library(forcats)
y$let <- fct_drop(y$let)

Upvotes: 2

CRich
CRich

Reputation: 118

Adding to Hong Ooi's answer, here is an example I found from R-Bloggers.

# Create some fake data
x <- as.factor(sample(head(colors()),100,replace=TRUE))
levels(x)
x <- x[x!="aliceblue"]
levels(x) # still the same levels
table(x) # even though one level has 0 entries!

The solution is simple: run factor() again:
x <- factor(x)
levels(x)

Upvotes: 3

Se&#241;or O
Se&#241;or O

Reputation: 17412

There's a recently added function in R for this:

y <- droplevels(y)

Upvotes: 145

Hong Ooi
Hong Ooi

Reputation: 57686

Just do y$let <- factor(y$let). Running factor on an existing factor variable will reset the levels to only those that are present.

Upvotes: 23

Related Questions