Adapting string variables to specific characteristics in R

Question

I have the following data:

id code
1  I560
2  K980
3  R30
4  F500
5  650

I would like to do the following two actions regarding the colum code: i) select the two numbers after the letter and ii) remove those observations that do not start with a letter. So in the end, the data frame should look like this:

id code
1  I56
2  K98
3  R30
4  F50

Ronak Shah · Accepted Answer

In base R, you could do :

subset(transform(df, code = sub('([A-Z]\d{2}).*', '\1', code)), 
       grepl('^[A-Z]', code))

Or using tidyverse functions

library(dplyr)
library(stringr)

df %>%
  mutate(code = str_extract(code, '[A-Z]\d{2}')) %>%
  filter(str_detect(code, '^[A-Z]'))

#  id code
#1  1  I56
#2  2  K98
#3  3  R30
#4  4  F50

Adapting string variables to specific characteristics in R

Answers (2)

data

Related Questions