Reputation: 477
Following on from the question as found by the link below.
How to test if the first three characters in a string are letters or numbers in r?
How do I include it to check that the 4th character is numeric also? For instance, an example of my dataframe is as follows.
ID X
1 MJF34
2 GA249D
3 DEW235R
4 4SDFR3
5 DAS3
6 BHFS7
So again, I want the first three characters in the string to be letters and I also want the 4th to be any number between 0-9. If the given rule is achieved then I want it to paste the first three letters of the X variable in a new column. If not I want it to say "FR". Hence the final dataset it as follows.
ID X Y
1 MJF34 MJF
2 GA249D FR
3 DEW235R DEW
4 4SDFR3 FR
5 DAS3 DAS
6 BHFS7 FR
What I have so far that checks the first three letters is:
sub_string<-substr(df$X, 1, 3)
df$Y<-ifelse(grepl('[0-9]',sub_string), "FR", sub_string)
I have tried to expand it to read the 4th but doesn't seem to work.
sub_number<-substr(df$X, 4, 4)
df$Y<-ifelse(grepl('[0-9]',sub_string) && !grepl('[0-9]',sub_number), "FR", sub_string)
I'm probably doing something obviously wrong but can't seem to figure it out? Thanks in advance
Upvotes: 0
Views: 518
Reputation: 326
The stringr package may be useful in your case:
library(dplyr)
library(stringr)
df %>%
mutate(Y = if_else(str_detect(X, "^[A-Z]{3}[0-9]"),
str_sub(X, start = 1, end = 3),
"FR"))
Output:
# A tibble: 6 x 3
ID X Y
<int> <chr> <chr>
1 1 MJF34 MJF
2 2 GA249D FR
3 3 DEW235R DEW
4 4 4SDFR3 FR
5 5 DAS3 DAS
6 6 BHFS7 FR
Upvotes: 0
Reputation: 16121
Based on the code you posted you can use this:
x = c("MJF34", "GA249D", "DEW235R")
ifelse(grepl('[0-9]',substr(x, 1, 3)) | !grepl('[0-9]',substr(x, 4, 4)), "FR", substr(x, 1, 3))
# [1] "MJF" "FR" "DEW"
You can store this as a function if you want to use it again in your code:
vec = c("MJF34", "GA249D", "DEW235R")
UpdateVector = function(x) ifelse(grepl('[0-9]',substr(x, 1, 3)) | !grepl('[0-9]',substr(x, 4, 4)), "FR", substr(x, 1, 3))
UpdateVector(vec)
# [1] "MJF" "FR" "DEW"
Upvotes: 1
Reputation: 70256
I would use a logical index like this:
idx <- grepl("^[A-Z]{3}\\d", df$X) # you can use ignore.case=TRUE too
df$Y <- "FR"
df[idx, "Y"] <- substr(df[idx, "X"], 1, 3)
# ID X Y
#1 1 MJF34 MJF
#2 2 GA249D FR
#3 3 DEW235R DEW
#4 4 4SDFR3 FR
#5 5 DAS3 DAS
#6 6 BHFS7 FR
Upvotes: 2