Reputation: 329
I have a column in my dataset that has various different numeric values in it. However, 3 of the numbers have a specific label, while all others have a general label. Going through the dataset one by one is not an option. It is a very large dataset with 167K obs.
Below shows all the unique values that are in the column:
> unique(NYC_2019_Arrests$JURISDICTION_CODE)
Levels: 0 1 2 3 4 6 7 9 11 12 13 14 15 16 69 71 72 73 74 76 79 85 87 88 97
The levels of JURISDICTION_CODE
are defined as follows:
JURISDICTION_CODE - Jurisdiction responsible for arrest. Jurisdiction codes 0(Patrol), 1(Transit) and 2(Housing) represent NYPD whilst codes 3 and more represent non NYPD jurisdictions.
This is the code that I tried to get it to work but just returns an error:
> NYC_2019_Arrests$JURISDICTION_CODE <- factor(NYC_2019_Arrests$JURISDICTION_CODE, levels = c(0,1,2, 3:100), labels = c("Patrol", "Transit", "Housing", "Non-NYPD Jurisdiction"))
Error in factor(NYC_2019_Arrests$JURISDICTION_CODE, levels = c(0, 1, 2, :
invalid 'labels'; length 4 should be 1 or 101
I also tried the above code by taking out the 3:100 and leave in the label but that also did not work.
It would be greatly appreciated if anybody here would know how to make it that all values 3 and above has the generic without having to type out all of the numbers individually.
Thanks!
Upvotes: 1
Views: 452
Reputation: 63
The error message is providing some direction. The problem is that the labels vector is of length 4 but your levels are length 101. I think you are almost there with the original code. Just make the labels to the correct length with:
reps<-rep("Non-NYPD Jurisdiction",98)
NYC_2019_Arrests$JURISDICTION_CODE <- factor(NYC_2019_Arrests$JURISDICTION_CODE, levels = c(0,1,2, 3:100), labels = c("Patrol", "Transit", "Housing", reps))
Edit with explanation:
Run this code for additional explanation.
#The key is that labels needs the same vector length as level
#length of levels
levels <- c(0,1,2, 3:100)
print(length(levels))
#length of original levels
labels = c("Patrol", "Transit", "Housing", "Non-NYPD Jurisdiction")
print(length(labels))
#This is problematic because what happens for when level - 4. labels[4] would be null.
#Therefore need to repeat "Non-NYPD Jurisdiction" for each level
#since length(3:100) is 98 that is how we know we need 98
reps<-rep("Non-NYPD Jurisdiction",98)
labels <- c("Patrol", "Transit", "Housing", reps)
print(length(labels))
Upvotes: 1
Reputation: 951
There are several ways to solve this. The simplest and best way I can think of is to use case_when
from dplyr
Here is an example:
library(dplyr)
case_when(mtcars$carb == 1 ~ "One",
mtcars$carb == 2 ~ "Two",
mtcars$carb >= 3 ~ "Three or More")
Upvotes: 0