Reputation: 51
I just started this programming, apologize for asking this simple question but I am stuck.
I have a data.table called s3:
s3:
ClaimID dx dxgroup
15nhbfcgcda 113.8 NA
15nhbfcgcda 156.8 NA
15nhbfcgcda 110.8 059
15nhbfcfssa 135.8 NA
15nhb4dfgda V70.3 NA
15nhbf644da 118.8 042
S3 has 30000 rows.
I want to apply this logic:
If dxgroup = NA(
If dx (fisrt 4 characters match with)= (2024, 2967, 9786,9788,8263)
then dxgroup = (first 4 character of dx)
else dx (fisrt 3 characters match with) = (V70, 042,897)
then dxgroup = (first 3 character of dx)
else dxgroup = dx
)
Result should be like :
ClaimID dx dxgroup
15nhbfcgcda 113.8 113.8
15nhbfcgcda 156.8 156.8
15nhbfcgcda 110.8 059
15nhbfcfssa 135.8 135.8
15nhb4dfgda V70.3 V70
15nhbf644da 118.8 042
Please advice ?
I apologize: It is my first time I am asking something here, so not used to yet. So I did something like this(I have no if this is correct and I got error as well):
sample4<-sample3[, dxgroup := { if (dxgroup == NA)
- { if (substring(sample3$dx,1,4) == list (2501,2780,4151,5301,5751,6860,7807,7890,9898,9955,9970)) substring(sample3$dx,1,4)
- else if (substring(sample3$dx,1,3) == list (042,493,682,850,V72)) substring(sample3$dx,1,3)
- else if (substring(sample3$dx,1,4) == list (8540, 8541)) substring(sample3$dx,1,3)
- else if (substring(sample3$dx,1,3) == list (043, 044)) 042
- else if (substring(sample3$dx,1,3) == list (789) & substring(sample3$dx,1,3) != list(7891,7893,78930)) 7890
- else if (substring(sample3$dx,1,4) == list (7865) & substring(sample3$dx,1,4) != list(78651,78652,78659)) 78650}
- else sample3$dx}] Error in if (dxgroup == NA) { : missing value where TRUE/FALSE needed In addition: Warning message: In if (dxgroup == NA) { : the condition has length > 1 and only the first element will be used
Upvotes: 5
Views: 8394
Reputation: 30331
This is a more data.table
friendly solution:
library(data.table)
s3 <- data.table(s3)
s3[is.na(dxgroup) & (substring(ClaimID, 1, 3) %in% ("V70", "042", "897")), dxgroup := substring(dx, 1, 3)]
s3[is.na(dxgroup) & (substring(ClaimID, 1, 4) %in% ("2024", "2967", "9786", "9788", "8263")), dxgroup := substring(dx, 1, 4)]
s3[is.na(dxgroup), dxgroup := dx] #Default
Basically, you work from the most specific condition to the most global condition, as each line in the above script potentially overwrites matches from the previous line.
(I assume you're using the data.table package. data.tables
are different than data.frames
, and in my opinion much better.)
Upvotes: 3
Reputation: 55420
You have the logic all set.
Note that with data.table (well, almost all of R
), you can wrap the j
in {curly brackets}
and the final statement in the brackets is what will be assigned. eg:
DT[, dxgroup := { if (clause1)
{if (foo) beebar else bar}
else chewybar
}
]
Upvotes: 7