Reputation: 361
Unfortunately, I am struggling to figure out how to create a new variable based on a range of categorical variables (with missing values).
I have the below dataset (simulated data)
df = data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007, 1008,1009,1010,1011),
Disease_code_1 = c('I802', 'H356','G560','D235', 'B178', 'F011', 'F023', 'C761', 'H653', 'A049', 'J679'),
Disease_code_2 = c('A071','NA','G20','NA','NA', 'A049','NA', 'NA','G300','G308','A045'),
Disease_code_3 = c('H250', 'NA','NA','I802','NA', 'A481', 'NA','NA','NA','NA','D352'))
Which gives:
ID Disease_code_1 Disease_code_2 Disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
3 1003 G560 G20 NA
4 1004 D235 NA I802
5 1005 B178 NA NA
6 1006 F011 A049 A481
7 1007 F023 NA NA
8 1008 C761 NA NA
9 1009 H653 G300 NA
10 1010 A049 G308 NA
11 1011 J679 A045 D352
I would like to create a new variable that assigns a 1 (disease present) for those with a subset of disease codes (e.g. F023, G20, G300). I've attempted to follow a previously answered stack overflow questions with limited success:
df$test <- NA
df$test <-sapply(df[ , 2:4] ,
FUN = function(x) recode(x, "'G20' =1; 'G300' =1",
as.factor.result=FALSE))
Which results in the error:
Error: Argument 2 must be named, not unnamed
Ideally, I would like my dataset to look like this:
ID Disease_code_1 Disease_code_2 Disease_code_3 Disease_present
1 1001 I802 A071 H250 0
2 1002 H356 NA NA 0
3 1003 G560 G20 NA 1
4 1004 D235 NA I802 0
5 1005 B178 NA NA 0
6 1006 F011 A049 A481 0
7 1007 F023 NA NA 0
8 1008 C761 NA NA 0
9 1009 H653 G300 NA 1
10 1010 A049 G308 NA 0
11 1011 J679 A045 D352 0
Really appreciate any suggestions!
Upvotes: 0
Views: 103
Reputation: 3183
You can just use apply
as below:
df$Disease_present <- apply(df[, -1], 1, function(x) {
if(any(x %in% c("G20", "G300"))) {
return(1)
} else {
return(0)
}
})
df
ID Disease_code_1 Disease_code_2 Disease_code_3 Disease_present
1 1001 I802 A071 H250 0
2 1002 H356 NA NA 0
3 1003 G560 G20 NA 1
4 1004 D235 NA I802 0
5 1005 B178 NA NA 0
6 1006 F011 A049 A481 0
7 1007 F023 NA NA 0
8 1008 C761 NA NA 0
9 1009 H653 G300 NA 1
10 1010 A049 G308 NA 0
11 1011 J679 A045 D352 0
Upvotes: 1