Carl O'Beirne
Carl O'Beirne

Reputation: 329

Convert Number to Factor using Labels in R

I have a column in my dataset that has various different numeric values in it. However, 3 of the numbers have a specific label, while all others have a general label. Going through the dataset one by one is not an option. It is a very large dataset with 167K obs.

Below shows all the unique values that are in the column:

> unique(NYC_2019_Arrests$JURISDICTION_CODE)
Levels: 0 1 2 3 4 6 7 9 11 12 13 14 15 16 69 71 72 73 74 76 79 85 87 88 97

The levels of JURISDICTION_CODE are defined as follows:

JURISDICTION_CODE - Jurisdiction responsible for arrest. Jurisdiction codes 0(Patrol), 1(Transit) and 2(Housing) represent NYPD whilst codes 3 and more represent non NYPD jurisdictions.

This is the code that I tried to get it to work but just returns an error:

> NYC_2019_Arrests$JURISDICTION_CODE <- factor(NYC_2019_Arrests$JURISDICTION_CODE, levels = c(0,1,2, 3:100), labels = c("Patrol", "Transit", "Housing", "Non-NYPD Jurisdiction"))
Error in factor(NYC_2019_Arrests$JURISDICTION_CODE, levels = c(0, 1, 2,  : 
  invalid 'labels'; length 4 should be 1 or 101

I also tried the above code by taking out the 3:100 and leave in the label but that also did not work.

It would be greatly appreciated if anybody here would know how to make it that all values 3 and above has the generic without having to type out all of the numbers individually.

Thanks!

Upvotes: 1

Views: 452

Answers (2)

pete_a_dunham
pete_a_dunham

Reputation: 63

The error message is providing some direction. The problem is that the labels vector is of length 4 but your levels are length 101. I think you are almost there with the original code. Just make the labels to the correct length with:

reps<-rep("Non-NYPD Jurisdiction",98)
NYC_2019_Arrests$JURISDICTION_CODE <- factor(NYC_2019_Arrests$JURISDICTION_CODE, levels = c(0,1,2, 3:100), labels = c("Patrol", "Transit", "Housing", reps))

Edit with explanation:

Run this code for additional explanation.

#The key is that labels needs the same vector length as level

#length of levels
levels <- c(0,1,2, 3:100)
print(length(levels))
#length of original levels
labels = c("Patrol", "Transit", "Housing", "Non-NYPD Jurisdiction")
print(length(labels))
#This is problematic because what happens for when level - 4. labels[4] would be null.
#Therefore need to repeat "Non-NYPD Jurisdiction" for each level
#since length(3:100) is 98 that is how we know we need 98
reps<-rep("Non-NYPD Jurisdiction",98)
labels <- c("Patrol", "Transit", "Housing", reps)
print(length(labels))

Upvotes: 1

Magnus Nordmo
Magnus Nordmo

Reputation: 951

There are several ways to solve this. The simplest and best way I can think of is to use case_when from dplyr Here is an example:

library(dplyr)

case_when(mtcars$carb == 1 ~ "One",
          mtcars$carb == 2 ~ "Two",
          mtcars$carb >= 3 ~ "Three or More")

Upvotes: 0

Related Questions