Reputation: 17
I need to re-categorize codes that represent various diseases so as to form appropriate groups for later analysis.
Many of the groupings include ranges that look like this:
1.0 to 1.5, 1.8 to 2.5, 3.0
where another might be 37.0
Originally I thought that something like this might work:
x <-c(0:.9, 1.9:2.9, 7.9:8.9, 4.0:4.9, 3:3.9, 5:5.9, 6:6.9, 11:11.9, 9:9.9, 10:10.9, 12.9, 13:13.9, 14,14.2, 14.8)
df$disease_cat[df$site_code %in% x] <- "disease a"
The problem is, 0.1,0.2
etc. are not being recognized as being in the range of 0:0.9
.
I now understand that 5:10
(for example) in r is actually 5,6,7...10
What is a better way to code these intervals so that the decimals will be recognized as being in the interval 0
to 0.9
? (keeping in mind that there will be many "mini" ranges and the idea of coding them all explicitly isn't particularly appealing)
Upvotes: 1
Views: 1896
Reputation: 32558
#This the list of your ranges that you want to check
ranges = list(c(0,.9), c(1.9,2.9), c(7.9,8.9), c(4.0,4.9), c(3,3.9), c(5,5.9), c(6,6.9), c(11,11.9), c(9,9.9), c(10,10.9), c(12.9), c(13,13.9), c(14),c(14.2), c(14.8))
#This is the values that you want to check for each range in ranges
values = c(1,2,3,4.5)
#You can check each value in each range with following command
output = data.frame(t(sapply(ranges, function(x) (min(x)<values & max(x)>values))))
#Maybe set column names to values so you know clearly what you are checking.
#Column names are values, row names are indexes of the ranges
colnames(output) = values
output$ranges = sapply(ranges, function(x) paste(x,collapse = "-"))
Upvotes: 1
Reputation: 1515
You can find the answer by printing the content of c(1.1:4)
. The result is [1] 1.1 2.1 3.1
. The thing you need is findInterval
function. Check out this solution:
findInterval(c(1,2,3,4.5), c(1.1,4)) == 1
If you would like to have the inclusive right boundary, i. e. [1.1, 4] interval, you can use rightmost.closed
parameter:
findInterval(c(1,2,3,4.5), c(1.1,4), rightmost.closed = TRUE) == 1
EDIT:
Here is the solution for a more general problem you have described:
d = data.frame(disease = c('d1', 'd2', 'd3'), minValue = c(0.3, 1.2, 2.2), maxValue = c(0.6, 1.9, 2.5))
measurements = c(0.1, 0.5, 2.2, 0.3, 2.7)
findDiagnosis <- function(data, measurement) {
diagnosis = data[data$minValue <= measurement & measurement <= data$maxValue,]
if (nrow(diagnosis) == 0) {
return(NA)
} else {
return(diagnosis$disease)
}
}
sapply(measurements, findDiagnosis, data = d)
Upvotes: 1
Reputation: 19857
I think you want this:
c(1,2,3,4.5) >= 1.1 & c(1,2,3,4.5) <= 4
[1] FALSE TRUE TRUE FALSE
Examine the output of 1.1:4
:
1.1:4
[1] 1.1 2.1 3.1
You are actually testing whether elements from your vector are exactly equal to 1.1, 2.1, or 3.1
Upvotes: 1