hmnoidk
hmnoidk

Reputation: 565

Combining Factor Levels from a Dataframe in R

I have variable of type factor with three levels: Fatal injury, Non-fatal injury and P.D. only:

     head(OttawaCollisions$Collision_Classification)
[1] P.D. only        Non-fatal injury P.D. only        P.D. only        P.D. only        P.D. only       
Levels: Fatal injury Non-fatal injury P.D. only

How can I combine "Fatal injury" and "Non-fatal injury" into a single level so that fatalities get added to the injuries?

Better yet, could I even just remove the fatalities somehow? In that case I need each instance that is fatal to be removed from the data frame, not just coded NA or something.

Upvotes: 2

Views: 4708

Answers (2)

Sathish
Sathish

Reputation: 12703

Data:

x <- factor( rep( c('P.D. only', 'Non-fatal injury' , 'fatal injury'), 2) )
x
# [1] P.D. only        Non-fatal injury fatal injury     P.D. only       
# [5] Non-fatal injury fatal injury    
# Levels: fatal injury Non-fatal injury P.D. only

Code: You can rename the level using the labels argument. Ignore the warning of duplicated levels. Here Non-fatal injury and fatal injury are combined together with Fatalities. Finally, drop the duplicated levels using droplevels() function.

x <- factor( x = x, 
             levels = c('P.D. only', 'Non-fatal injury' , 'fatal injury'),
             labels = c('P.D. only', 'Fatalities', 'Fatalities'))
# [1] P.D. only  Fatalities Fatalities P.D. only  Fatalities Fatalities
# Levels: P.D. only Fatalities Fatalities

droplevels(x)
# [1] P.D. only  Fatalities Fatalities P.D. only  Fatalities Fatalities
# Levels: P.D. only Fatalities

EDIT: combined code based on your dataframe name

OttawaCollisions$CollisionClass <- factor( x = OttawaCollisions$CollisionClass, 
                                           levels = c('P.D. only', 'Non-fatal injury' , 'fatal injury'),
                                           labels = c('P.D. only', 'Fatalities', 'Fatalities'))
OttawaCollisions$CollisionClass <- droplevels(OttawaCollisions$CollisionClass)

EDIT2: data.table solution.

library('data.table')
setDT(OttawaCollisions)
OttawaCollisions[ i = CollisionClass %in% c( "fatal injury", "Non-fatal injury"), 
                  j = CollisionClass := "Fatalities"]
OttawaCollisions[, CollisionClass := droplevels(CollisionClass) ]

EDIT3: another base R solution. I would prefer this base R solution, instead of the first one (using labels in factor()), because, it will make life easier when you have more levels in the data.

OttawaCollisions$CollisionClass <- as.character(OttawaCollisions$CollisionClass)
OttawaCollisions$CollisionClass <- factor( with(OttawaCollisions, 
                                                replace( CollisionClass, 
                                                         CollisionClass %in% c( "fatal injury", "Non-fatal injury"),
                                                         "Fatalities") ) )

Upvotes: 2

mysteRious
mysteRious

Reputation: 4294

You can also reassign levels directly:

> test_df <- tibble(x=as.factor(c('Fatal','Non-fatal','PD','Fatal','Non-fatal','PD')), y=1:6)
> test_df
# A tibble: 6 x 2
  x             y
  <fct>     <int>
1 Fatal         1
2 Non-fatal     2
3 PD            3
4 Fatal         4
5 Non-fatal     5
6 PD            6
> levels(test_df$x)
[1] "Fatal"     "Non-fatal" "PD"       

Now that you know the order, replace the level names that you want to combine:

> levels(test_df$x) <- c("Fatal","Other","Other")
> test_df
# A tibble: 6 x 2
  x         y
  <fct> <int>
1 Fatal     1
2 Other     2
3 Other     3
4 Fatal     4
5 Other     5
6 Other     6

And then you can do additional processing, e.g.:

> library(dplyr)
> test_df %>% group_by(x) %>% summarize(n)
# A tibble: 2 x 2
  x         n
  <fct> <dbl>
1 Fatal  45.0
2 Other  45.0

Upvotes: 1

Related Questions