rjf298
rjf298

Reputation: 3

Regex in R - returning only single integers (less than 10) from integer string (up to 100)

I have a file with peoples ages, and want to subset age ranges (eg. under10, 35-44 etc).

Whilst age ranges of double digit numbers works fine using grep:

X_35_44 <- X[ grep("35|36|37|38|39|40|41|42|43|44", X$Age) , ]

When trying to subset for anything under 10 eg:

X_10under <- X[ grep("0|1|2|3|4|5|6|7|8|9|10|", X$Age) , ]

I am returned any age with a 1 in it (eg. 31) or a 2 or a 3, rather than just those numbers under 10.

How do I ensure that this doesn't happen?

Any help would be much appreciated!

Thanks in advance

Upvotes: 0

Views: 119

Answers (2)

Mostafa90
Mostafa90

Reputation: 1706

A solution with

ifelse()

 as.integer(df$age)
    df$age_cat <- ifelse(df$age < 10, "age_0-10", ifelse(10 < df$age < 20, "age_10-20", "age_20-"))

Choose your own range ...

Upvotes: 1

IRTFM
IRTFM

Reputation: 263499

Using the principle of not accepting failed code, but rather delivering a more effective coding solution, I'm going to disagree with the regex strategy and suggest you instead use cut or findInterval.

X <- data.frame(Ages = sample(1:85, 300, repl=TRUE))
X$age_cat <- cut(X$Age, c(0, 10, 45, 60, 75, Inf), labels=c("under10", 
    '10-44','45-59','60-74','75+'), right=FALSE, include.lowest=TRUE)
head(X)
#=========    
  Ages age_cat
1   65   60-74
2   34   10-44
3   19   10-44
4   79     75+
5    5 under10
6   51   45-59

Upvotes: 1

Related Questions