Economist_Ayahuasca
Economist_Ayahuasca

Reputation: 1642

Adapting string variables to specific characteristics in R

I have the following data:

id code
1  I560
2  K980
3  R30
4  F500
5  650

I would like to do the following two actions regarding the colum code: i) select the two numbers after the letter and ii) remove those observations that do not start with a letter. So in the end, the data frame should look like this:

id code
1  I56
2  K98
3  R30
4  F50

Upvotes: 1

Views: 26

Answers (2)

akrun
akrun

Reputation: 887611

An option with substr from base R

df1$code <- substr(df1$code, 1, 3)
df1[grepl('^[A-Z]', df1$code),]
#  id code
#1  1  I56
#2  2  K98
#3  3  R30
#4  4  F50

data

df1 <- structure(list(id = 1:5, code = c("I56", "K98", "R30", "F50", 
"650")), row.names = c(NA, -5L), class = "data.frame")

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389175

In base R, you could do :

subset(transform(df, code = sub('([A-Z]\\d{2}).*', '\\1', code)), 
       grepl('^[A-Z]', code))

Or using tidyverse functions

library(dplyr)
library(stringr)

df %>%
  mutate(code = str_extract(code, '[A-Z]\\d{2}')) %>%
  filter(str_detect(code, '^[A-Z]'))

#  id code
#1  1  I56
#2  2  K98
#3  3  R30
#4  4  F50

Upvotes: 1

Related Questions