Seydou GORO
Seydou GORO

Reputation: 1285

Transform dummy variable into categorical variable

Here is my dataframe

data<-data.frame(
ID=c(1:8),
Diag1=c(1,0,1,0,1,0,1,0),
Diag2=c(0,1,0,1,0,0,1,0),
Diag3=c(0,0,0,1,0,1,1,0),
Multiple.Diag=c(0,0,1,1,0,0,1,0)
)

I have patients with different diagnoses, some of them have multiple diagnoses. These diagnoses are dummy variables that need to be converted into categorical variables. If the patient has Mult.diag==1, his diagnosis will be Multiple.diag, otherwise his diagnosis will be either Diag1, Diag2 or Diag3. If the patient has 0 for the whole variables, the diagnosis will be "Other".

Here is what I want to have:

  ID     Diagnosis
1  1         Diag1
2  2         Diag2
3  3 Multiple.Diag
4  4 Multiple.Diag
5  5         Diag1
6  6         Diag3
7  7 Multiple.Diag
8  8         Other

Upvotes: 0

Views: 826

Answers (3)

akrun
akrun

Reputation: 887951

Here is an option with max.col to find the index of the column with the highest value in each row of subset of 'Diag' columns, use that index to get the column name, then change those elements to 'Multiple.Diag' where it is 1 and if there are no 1s in a row, we categorize it as 'Other' using a logical expression created with rowSums

-codes

out <- cbind(data['ID'], Diagnosis = names(data)[-1][
              max.col(data[startsWith(names(data), 'Diag')], 'first')])
i1 <- as.logical(data$Multiple.Diag)
out$Diagnosis[i1] <- 'Multiple.Diag'
i2 <- rowSums(data[-1]) == 0
out$Diagnosis[i2] <- 'Other'

-output

out
#  ID     Diagnosis
#1  1         Diag1
#2  2         Diag2
#3  3 Multiple.Diag
#4  4 Multiple.Diag
#5  5         Diag1
#6  6         Diag3
#7  7 Multiple.Diag
#8  8         Other

max.col is vectorized and should be very fast


Or another option is to use which with arr.ind = TRUE

m1 <- which(data[-1] == 1, arr.ind = TRUE)
cbind(data['ID'], Diagnosis = ifelse(rowSums(data[-1]) == 0, 
    'Other', names(data)[-1][tapply(m1[, 2], m1[,1], FUN = max)]))

-output

#  ID     Diagnosis
#1  1         Diag1
#2  2         Diag2
#3  3 Multiple.Diag
#4  4 Multiple.Diag
#5  5         Diag1
#6  6         Diag3
#7  7 Multiple.Diag
#8  8         Other

Or using tidyverse using the same method

library(dplyr)
data %>% 
   transmute(ID, Diagnosis = case_when(rowSums(.[-1]) == 0 ~ "Other", 
         TRUE ~ names(.)[-1][max.col(.[-1], 'last')]))

-output

#  ID     Diagnosis
#1  1         Diag1
#2  2         Diag2
#3  3 Multiple.Diag
#4  4 Multiple.Diag
#5  5         Diag1
#6  6         Diag3
#7  7 Multiple.Diag
#8  8         Other

Or another option where we use rowwise along with c_across

data %>%
   rowwise %>% 
   transmute(ID, Diagnosis = coalesce(case_when(as.logical(Multiple.Diag) 
    ~ 'Multiple.Diag',
     TRUE ~ names(.)[-1][as.logical(c_across(-1))][1] ), 'Other'))

-output

# A tibble: 8 x 2
# Rowwise: 
#     ID Diagnosis    
#  <int> <chr>        
#1     1 Diag1        
#2     2 Diag2        
#3     3 Multiple.Diag
#4     4 Multiple.Diag
#5     5 Diag1        
#6     6 Diag3        
#7     7 Multiple.Diag
#8     8 Other        

Upvotes: 1

Onyambu
Onyambu

Reputation: 79348

with tidyverse you could also do:

data %>% 
  pivot_longer(-ID) %>%
  group_by(ID) %>%
  slice(which.max(as.integer(factor(name))*value))%>%
  mutate(name = if_else(value == 0, 'other',name), value= NULL)
 # A tibble: 8 x 2
# Groups:   ID [8]
     ID name         
  <int> <chr>        
1     1 Diag1        
2     2 Diag2        
3     3 Multiple.Diag
4     4 Multiple.Diag
5     5 Diag1        
6     6 Diag3        
7     7 Multiple.Diag
8     8 other  

Upvotes: 1

Duck
Duck

Reputation: 39613

You can use apply() and built-in function like this. You can add the result to your original data or saving in a new dataframe. The function myfunc extract the name of the variable for treatment based on the directions you mentioned. Here the code:

#Code
myfunc <- function(x)
{
  y <- names(x)[max(which(x==1))]
  if(is.na(y))
  {
    y <- 'Others'
  }
  return(y)
}
#Add var
data$Var <- apply(data[,-1],1,myfunc)

Output:

  ID Diag1 Diag2 Diag3 Multiple.Diag           Var
1  1     1     0     0             0         Diag1
2  2     0     1     0             0         Diag2
3  3     1     0     0             1 Multiple.Diag
4  4     0     1     1             1 Multiple.Diag
5  5     1     0     0             0         Diag1
6  6     0     0     1             0         Diag3
7  7     1     1     1             1 Multiple.Diag
8  8     0     0     0             0        Others

Upvotes: 1

Related Questions