Reputation: 1396
I have a data frame with a variable called "Control_Category". The variable has six names in it, which for simplicity sake I am going to make generic:
df <- data.frame(Control_Category = c("Really Long Name One",
"Super Really Long Name Two",
"Another Really Flippin' Long Name Three",
",Seriously, It's a Fourth Long Name",
"Definitely a Fifth Long Name",
"Finally, This guy is done, number six"))
I'm using this to make a slight joke. So, while the names are long they are tidy in that the values for each (1-6) are consistent. In this specific character vector of the data.frame, there are hundreds and hundreds of entries that match any one of those six.
What I need to do is to replace the long names with a short name. Therefore, where any of the above names are identified, replace that name with a shorter version, like:
One Two Three Four Five Six
I tried a function using 'case_when' and it failed miserably. Any help would be appreciated.
Additional Information Based on Questions From Community
The order of the items doesn't matter. There isn't a designation of 1 - 6. There just happen to be six and I made six stupid long strings. The strings themselves are long.
So, anywhere "Super Really Long Name Two" exists, that value needs to be updated to something like 'TWO" or a "Short_Name" that that approximate "TWO". In reality, the category is called "Audit, Testing and Examination Results". The short name would ideally just be "AUDIT".
Upvotes: 3
Views: 513
Reputation: 659
Here's a larger data frame with long names:
set.seed(101)
long_names <- c("Really Long Name One",
"Super Really Long Name Two",
"Another Really Flippin' Long Name Three",
",Seriously, It's a Fourth Long Name",
"Definitely a Fifth Long Name",
"Finally, This guy is done, number six")
df <- data.frame(control_category=sample(long_names, 100, replace=TRUE))
head(df)
## control_category
## 1 Another Really Flippin' Long Name Three
## 2 Really Long Name One
## 3 Definitely a Fifth Long Name
## 4 ,Seriously, It's a Fourth Long Name
## 5 Super Really Long Name Two
## 6 Super Really Long Name Two
Using the unique
function will give you the category names:
category <- unique(df$control_category)
print(category)
## [1] Another Really Flippin' Long Name Three
## [2] Really Long Name One
## [3] Definitely a Fifth Long Name
## [4] ,Seriously, It's a Fourth Long Name
## [5] Super Really Long Name Two
## [6] Finally, This guy is done, number six
## 6 Levels: ,Seriously, It's a Fourth Long Name ...
Notice that the levels are in alphabetical order (see levels(category)
). In this case, the simplest way is to change the order manually by looking at the current order. In this case, category[c(2, 5, 1, 4, 3, 6)]
will give you the right order. Finally,
df$control_category <- factor(
df$control_category,
levels=category[c(2, 5, 1, 4, 3, 6)],
labels=c("one", "two", "three", "four", "five", "six")
)
head(df)
## control_category
## 1 three
## 2 one
## 3 five
## 4 four
## 5 two
## 6 two
Upvotes: 2
Reputation: 520918
You could just use gsub()
once for each replacement:
df$Control_Category <- gsub('Really Long Name One', 'One', df$Control_Category)
You can repeat similar logic to handle the other five long/short name pairs.
Upvotes: 4