Regex in R - returning only single integers (less than 10) from integer string (up to 100)

Question

I have a file with peoples ages, and want to subset age ranges (eg. under10, 35-44 etc).

Whilst age ranges of double digit numbers works fine using grep:

X_35_44 <- X[ grep("35|36|37|38|39|40|41|42|43|44", X$Age) , ]

When trying to subset for anything under 10 eg:

X_10under <- X[ grep("0|1|2|3|4|5|6|7|8|9|10|", X$Age) , ]

I am returned any age with a 1 in it (eg. 31) or a 2 or a 3, rather than just those numbers under 10.

How do I ensure that this doesn't happen?

Any help would be much appreciated!

Thanks in advance

IRTFM · Accepted Answer

Using the principle of not accepting failed code, but rather delivering a more effective coding solution, I'm going to disagree with the regex strategy and suggest you instead use cut or findInterval.

X <- data.frame(Ages = sample(1:85, 300, repl=TRUE))
X$age_cat <- cut(X$Age, c(0, 10, 45, 60, 75, Inf), labels=c("under10", 
    '10-44','45-59','60-74','75+'), right=FALSE, include.lowest=TRUE)
head(X)
#=========    
  Ages age_cat
1   65   60-74
2   34   10-44
3   19   10-44
4   79     75+
5    5 under10
6   51   45-59

Regex in R - returning only single integers (less than 10) from integer string (up to 100)

Answers (2)

Related Questions