Reputation: 133
I have a vector of strings. Most of the elements in the vector consist of one or more letters followed by one or more numbers. I wish to selectively replace only the words with "September" (and its abbreviations) in each string with "Sep" but retain the numbers. This is what I have tried out using stringr package
my.data <- c("01Sept2019", "05sep2019", "4September2019", "8sep2019",
"12oct2019", "4Jun2018", "17Mar2017", "09May2015", "13Sep19")
library(stringr)
my.data %>% str_replace_all("(?i)Sept?(ember)?[0-9]", "Sep")
#> [[1]]
#> [1] "01Sep019", "05Sep019", "4Sep019", "8Sep019", "13Sep9"
This is what I would like to obtain:
#> [1] "01Sep2019", "05Sep2019", "4Sep2019", "8Sep2019", "13Sep19"
Can someone please help me out. Thanks
Upvotes: 2
Views: 106
Reputation: 39707
In base you can use sub
with the pattern [Ss]ep[[:alpha:]]*
to find September and its abbreviations and replace it with Sep
.
sub("[Ss]ep[[:alpha:]]*", "Sep", my.data)
#[1] "01Sep2019" "05Sep2019" "4Sep2019" "8Sep2019" "12oct2019" "4Jun2018"
#[7] "17Mar2017" "09May2015" "13Sep19"
To match really only September followed by a number you can use:
sub("sep(t|(?=\\d))(e|(?=\\d))(m|(?=\\d))(b|(?=\\d))(e|(?=\\d))(r|(?=\\d))"
, "Sep", my.data, ignore.case=TRUE, perl=TRUE)
#[1] "01Sep2019" "05Sep2019" "4Sep2019" "8Sep2019" "12oct2019" "4Jun2018"
#[7] "17Mar2017" "09May2015" "13Sep19"
Upvotes: 4
Reputation: 887501
An option with str_replace
library(stringr)
library(dplyr)
my.data %>%
str_replace("(?i)(Sep[^0-9]+)", "Sep")
Upvotes: 1
Reputation: 4358
in Base-R
grep("Sep|sep",my.data,value=T)
output
[1] "01Sept2019" "05sep2019" "4September2019" "8sep2019"
[5] "13Sep19"
Upvotes: 0