Reputation: 85
I have a column 'lg_with_children' in my data frame that has 5 levels, 'Half and half', 'Mandarin', 'Shanghainese', 'Other', 'N/A', and 'Not important'. I want to condense the 5 levels down to just 2 levels, 'Shanghainese' and 'Other'.
In order to do this I used the revalue() function from the plyr package to successfully rename the levels. I used the code below and it worked fine.
data$lg_with_children <- revalue(data$lg_with_children,
c("Mandarin" = "Other"))
data$lg_with_children <- revalue(data$lg_with_children,
c("Half and half" = "Other"))
data$lg_with_children <- revalue(data$lg_with_children,
c("N/A" = "Other"))
data$lg_with_children <- revalue(data$lg_with_children,
c("Not important" = "Other"))
To condense the code a little I went back data before I revalued the levels and attempted to write a function. I tried the following after doing research on how to write your own functions (I'm rather new at this).
revalue_factor_levels <- function(df, col, source, target) {df$col <- revalue(df$col, c("source" = "target"))}
I intentionally left the df, col, source, and target generic because I need to revalue some other columns in the same way.
Next, I tried to run the code filling in the args and get this message:
I am not quite sure what the problem is. I tried the following adjustment to code and still nothing.
revalue_factor_levels <- function(df, col, source, target) {df$col <- revalue(df$col, c(source = target))}
Any guidance is appreciated. Thanks.
Upvotes: 1
Views: 268
Reputation: 85
Here is what I ended up with thanks to help from the community.
revalue_factor_levels <- function(df, col, ref_level) {
df[[col]] <- fct_other(df[[col]], keep = ref_level)
df
}
data <- revalue_factor_levels(data, "lg_with_children", "Shanghainese")
Upvotes: 0
Reputation: 6921
You can write your function to recode the levels - the easiest way to do that is probably to change the levels directly with levels(fac) <- list(new_lvl1 = c(old_lvl1, old_lvl2), new_lvl2 = c(old_lvl3, old_lvl4))
But there are already several functions that do it out of the box. I typically use the forcats
package to manipulate factors.
Check out fct_recode
from the forcats
package. Link to doc.
There are also other functions that could help you - check out the comments below.
Now, as to why your code isn't working:
df$col
looks for a column literally named col
. The workaround is to do df[[col]]
instead.df
at the end of your functionc(source = target)
will create a vector with one element named "source"
, regardless of what happens to be in the variable source
.
The solution is to create the vector c(source = target)
in 2 steps.revalue_factor_levels <- function(df, col, source, target) {
to_rename <- target
names(to_rename) <- source
df[[col]] <- revalue(df[[col]], to_rename)
df
}
Returning the df means the syntax is:
data <- revalue_factor_levels(data, "lg_with_children", "Mandarin", "Other")
I like functions that take the data as the first argument and return the modified data because they are pipeable.
library(dplyr)
data <- data %>%
revalue_factor_levels("lg_with_children", "Mandarin", "Other") %>%
revalue_factor_levels("lg_with_children", "Half and half", "Other") %>%
revalue_factor_levels("lg_with_children", "N/A", "Other")
Still, using forcats
is easier and less prone to breaking on edge cases.
Edit:
There is nothing preventing you from both using forcats
and creating your custom function. For example, this is closer to what you want to achieve:
revalue_factor_levels <- function(df, col, ref_level) {
df[[col]] <- forcats::fct_others(df[[col]], keep = ref_level)
df
}
# Will keep Shanghaisese and revalue other levels to "Other".
data <- revalue_factor_levels(data, "lg_with_children", "Shanghainese")
Upvotes: 2