user16584364
user16584364

Reputation: 13

How do I assign more than 2 numerical categories in R to single response?

I am quite new to R. I have a hospital dataset where patients are assigned categories based on diagnosis. For example

And so on with 8 diseases in total. The "/" here represents the presence of more than 1 disease. I want to categorize them numerically such that dis A=1, dis B=2 and so on. The above data needs to be:

I have tried it with sapply, as a factor with levels but the best I can get is a correct classification with only single diseases. The combination diseases are returning a NULL value. Is there a way to do this? Please help!

Here is a sample:

structure(list(Classification = c("IHD/other/cardiopulmonary", 
"IHD", "hypertensive", "IHD/other", "IHD/other", "IHD/other/CVA"
), Comorbidities = c("DM", "HT+DM", "HT+DM", NA, NA, "HT+DM"), 
    Diagnosis = c("CORONARY ARTERY DISEASE WITH MITRAL REGURGITATION WITH TRICUSPID REGURGITATION WITH PULOMNARY HYPERTENSION WITH DYSFUNCTION LEFT VENTRICLE WITH DIABETES MELLITUS", 
    "ACUTE CORONARY SYNDROME WITH ANTERIOR WALL MYOCARDIAL INFARCTION WITH CARDIOGENIC SHOCK WITH BLEEDING DIATHESIS WITH DIABETES MELLITUS WITH HYPERTENSION", 
    "ASPIRATION PNEUMONTIS WITH RESPIRATORY FALIURE WITH HYPERTENSION WITH HYPONATERMIA WITH DIABETES MELLITUS", 
    "ACUTE CORONARY SYNDROME WITH RIGHT BUNDLE BRANCH BLOCK WITH ANTERIOR WALL MYOCARDIAL INFARCTION WITH CARDIOGENIC SHOCK", 
    "COMPLETE HEART BLOCK WITH CARDIAC ARREST WITH INTERIOR WALL MYOCARDIAL INFARCTION", 
    "DIABETES MELLITUS WITH CORONARY ARTERY DISEASE WITH HYPERTENSION SYSTEMIC WITH ATRIAL FIBRILATION WITH PULMONARY TUBERCULOSIS WITH CEREBRO VASCULAR ACCIDENT WITH CARDIOGENIC SHOCK"
    )), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))

Upvotes: 1

Views: 69

Answers (3)

iago
iago

Reputation: 3266

It works:

stringr::str_replace_all(string = c("1 dis A/dis C 2 dis B 3 dis A/dis B/dis C", "dis A/dis B"), pattern = c('dis A' = '1', 'dis B' = '2','dis C' = '3'))
[1] "1 1/3 2 2 3 1/2/3" "1/2"    

Update

With the example data:

stringr::str_replace_all(string = df$Classification, pattern = c('IHD' = '1', 'other' = '2','cardiopulmonary' = '3', 'hypertensive' = '4', 'CVA'='5'))
[1] "1/2/3" "1"     "4"     "1/2"   "1/2"   "1/2/5"

So, in order to update your data, you can do:

df$Classification <- stringr::str_replace_all(string = df$Classification, pattern = c('IHD' = '1', 'other' = '2','cardiopulmonary' = '3', 'hypertensive' = '4', 'CVA'='5'))

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 389065

Here is one base R option which should work for any number of diseases without manually specifying a number for them.

#split the string on '/'
split_vals <- strsplit(df$Classification, '/')
#Get the unique values
all_vals <- unique(unlist(split_vals))
#Use match to get a unique number for each value.
df$Classification <- sapply(split_vals, function(x) 
                            paste(match(x, all_vals),collapse = '/'))
df

# Classification Comorbidities Diagnosis                                                                 
#  <chr>          <chr>         <chr>                                                                     
#1 1/2/3          DM            CORONARY ARTERY DISEASE WITH MITRAL REGURGITATION WITH TRICUSPID REGURGIT…
#2 1              HT+DM         ACUTE CORONARY SYNDROME WITH ANTERIOR WALL MYOCARDIAL INFARCTION WITH CAR…
#3 4              HT+DM         ASPIRATION PNEUMONTIS WITH RESPIRATORY FALIURE WITH HYPERTENSION WITH HYP…
#4 1/2            NA            ACUTE CORONARY SYNDROME WITH RIGHT BUNDLE BRANCH BLOCK WITH ANTERIOR WALL…
#5 1/2            NA            COMPLETE HEART BLOCK WITH CARDIAC ARREST WITH INTERIOR WALL MYOCARDIAL IN…
#6 1/2/5          HT+DM         DIABETES MELLITUS WITH CORONARY ARTERY DISEASE WITH HYPERTENSION SYSTEMIC…

Upvotes: 1

dy_by
dy_by

Reputation: 1241

simple subset with mgsub from qdap

dis <- c('dis A','dis B','dis C','dis D','dis E','dis F','dis G','dis H')
class <- 1:8

library(dplyr)
library(qdap)

dt %>% 
  mutate(`Disease classification` = mgsub(dis,class,`Disease classification`))

# dt %>% 
#   mutate(`NEW Disease classification` = mgsub(dis,class,`Disease classification`))

Upvotes: 0

Related Questions