Jacob
Jacob

Reputation: 17

Check if decimal values are in a range in R

I need to re-categorize codes that represent various diseases so as to form appropriate groups for later analysis.

Many of the groupings include ranges that look like this:

1.0 to 1.5, 1.8 to 2.5, 3.0

where another might be 37.0

Originally I thought that something like this might work:

x <-c(0:.9, 1.9:2.9, 7.9:8.9, 4.0:4.9, 3:3.9, 5:5.9, 6:6.9, 11:11.9, 9:9.9, 10:10.9, 12.9, 13:13.9, 14,14.2, 14.8)

df$disease_cat[df$site_code %in% x] <- "disease a"

The problem is, 0.1,0.2 etc. are not being recognized as being in the range of 0:0.9.

I now understand that 5:10 (for example) in r is actually 5,6,7...10

What is a better way to code these intervals so that the decimals will be recognized as being in the interval 0 to 0.9? (keeping in mind that there will be many "mini" ranges and the idea of coding them all explicitly isn't particularly appealing)

Upvotes: 1

Views: 1896

Answers (3)

d.b
d.b

Reputation: 32558

#This the list of your ranges that you want to check
ranges = list(c(0,.9), c(1.9,2.9), c(7.9,8.9), c(4.0,4.9), c(3,3.9), c(5,5.9), c(6,6.9), c(11,11.9), c(9,9.9), c(10,10.9), c(12.9), c(13,13.9), c(14),c(14.2), c(14.8))

#This is the values that you want to check for each range in ranges
values = c(1,2,3,4.5)

#You can check each value in each range with following command
output = data.frame(t(sapply(ranges, function(x) (min(x)<values & max(x)>values))))

#Maybe set column names to values so you know clearly what you are checking.
#Column names are values, row names are indexes of the ranges
colnames(output) = values
output$ranges = sapply(ranges, function(x) paste(x,collapse = "-"))

Upvotes: 1

Ardavel
Ardavel

Reputation: 1515

You can find the answer by printing the content of c(1.1:4). The result is [1] 1.1 2.1 3.1. The thing you need is findInterval function. Check out this solution:

findInterval(c(1,2,3,4.5), c(1.1,4)) == 1

If you would like to have the inclusive right boundary, i. e. [1.1, 4] interval, you can use rightmost.closed parameter:

findInterval(c(1,2,3,4.5), c(1.1,4), rightmost.closed = TRUE) == 1

EDIT:

Here is the solution for a more general problem you have described:

d = data.frame(disease = c('d1', 'd2', 'd3'), minValue = c(0.3, 1.2, 2.2), maxValue = c(0.6, 1.9, 2.5))
measurements = c(0.1, 0.5, 2.2, 0.3, 2.7)

findDiagnosis <- function(data, measurement) {
  diagnosis = data[data$minValue <= measurement & measurement <= data$maxValue,]
  if (nrow(diagnosis) == 0) {
    return(NA)
  } else {
    return(diagnosis$disease)
  }
}

sapply(measurements, findDiagnosis, data = d)

Upvotes: 1

scoa
scoa

Reputation: 19857

I think you want this:

c(1,2,3,4.5) >= 1.1 & c(1,2,3,4.5) <= 4
[1] FALSE  TRUE  TRUE FALSE

Examine the output of 1.1:4:

1.1:4
[1] 1.1 2.1 3.1

You are actually testing whether elements from your vector are exactly equal to 1.1, 2.1, or 3.1

Upvotes: 1

Related Questions