Seo Jung Park
Seo Jung Park

Reputation: 23

regrouping levels in one categorical variable

I am trying to simplify data analysis by combining levels of a categorical variables.

There are 6 levels in this variable Let's say the name of this variable is "candle" and the levels are: "Always", "Nearly always", "Sometimes", "Seldom", "Never", "Never used", NA

I wanted to regroup "Always" and "Nearly Always" as "Yes", leave "Sometimes" as it is, and "Seldom" and "Never" with "No"

I used:

data <- data %>%
mutate(candle_new = ifelse(candle == "Always", "Yes", ifelse(candle == "Nearly always", "Yes", ifelse(candle == "Sometimes", "St", 
ifelse(candle == "Never", "No", ifelse(candle == "seldom", "No", NA))))))

Although it runs and does not show any error message, when I check the original data, it does not seem like it worked.

Could you help me to figure out what I did wrong?

Thank you!

Upvotes: 0

Views: 3189

Answers (4)

Edward Carney
Edward Carney

Reputation: 1392

The car package has an elegant (IMO) recode function that works over multiple values.

yes.set <- c('Always','Nearly always')
no.set <- c('Seldom','Never','Never used')
# made up data
data <- data.frame(vals=sample(candles,50,replace=T))

data$vals<-recode(data$vals,"yes.set='Yes'; no.set='No'")

Anything that falls outside the desired set can be set to NA using an else parameter. You'd have to include the "Sometimes" value explicitly, first.

data$vals<-recode(data$vals,"yes.set='Yes'; no.set='No';'Sometimes'='Sometimes';else=NA")

Upvotes: 0

Florian
Florian

Reputation: 25385

I think instead of using ifelse, it would be more appropriate and legible to use match or left_join in this case.

So first we make a data.frame called match_df that looks as follows:

            old       new
1        Always       Yes
2 Nearly Always       Yes
3     Sometimes Sometimes
4        Seldom        No
5         Never        No

And then we look up the new values from that data.frame. We could do that with either a left_join, or with match:

set.seed(2)
library(dplyr)

# the match dataframe
match_df = data.frame(old = c('Always','Nearly Always','Sometimes','Seldom','Never'),
                      new = c('Yes','Yes','Sometimes','No','No'))

# sample data
df = data.frame(candle = sample(match_df$old,12,TRUE))

# option 1, with match
df %>% mutate(candle_new = match_df$new[match(candle,match_df$old)])

# option 2, left_join
df %>% left_join(match_df,by=c('candle'='old')) %>% rename(candle_new=new)

Hope this helps!

Upvotes: 1

MKR
MKR

Reputation: 20095

I can see it working. See the data and result.

data <- data.frame(id = 1:7, candle = c("Always", "Nearly always", "Sometimes", "Seldom", "Never", "Never used", NA))

library(dplyr)
data <- data %>%
  mutate(candle_new = ifelse(candle == "Always","Yes",
                             ifelse(candle == "Nearly always", "Yes",
                                    ifelse(candle == "Sometimes", "St",
                                           ifelse(candle == "Never", "No", ifelse(candle == "Seldom", "No", NA))))))

data
#  id        candle candle_new
#1  1        Always        Yes
#2  2 Nearly always        Yes
#3  3     Sometimes         St
#4  4        Seldom         No
#5  5         Never         No
#6  6    Never used       <NA>
#7  7          <NA>       <NA>

Upvotes: 0

felasa
felasa

Reputation: 145

There's not enough information but... Could it be that "seldom" within you nested ifelse has a lower case "s" in it?

Upvotes: 0

Related Questions