Subset dataframe using number in string?

Question

I have large data frame that looks something like this:

df1 = data.frame(A=c("A23", "A53", "B68"), B=c("Something-2030-002", "Something-4030-002",
                                               "Something-5030-002"))

I want to subset it to include only the observations with Something-X where X<5. That is:

df2 = data.frame(A=c("A23", "A53"), B=c("Something-2030-002", "Something-4030-002")

How can I do this with R?

Thanks

akrun · Accepted Answer

You can use sub to remove all the characters except the one digit following the first "-" and use that to create a logical index.

df1[sub('[^-]+-(.).*', '\1', df1$B)<5,]
#    A                  B
#1 A23 Something-2030-002
#2 A53 Something-4030-002

Regex demo

  [^-]+-(.).*

Regular expression visualization

Debuggex Demo

Subset dataframe using number in string?

Answers (1)

Related Questions