Reputation: 6496
I want to split a character string into two groups. The string's structure is pretty simple, yet I haven't been able to make it work.
txt <- "text12-01-2016"
It's always some letters, followed by a date, and the date, obviously starts with a number. I've tried the following regex at https://regex101.com/ and effectively get the string properly separated:
([a-zA-Z]*)([0-9].*)
1. "text"
2. "12-01-2016"
But when I try in R it fails:
strsplit(a[1],split = "([a-zA-Z]*)([0-9]*)")
[[1]]
[1] "" " " "" "." " " "" " " "" "-" "" "-" ""
And if I introduce double square brackets, then it "eats" out the last character of the first group, and the first of the second:
strsplit(txt,split = "([[a-zA-Z]]*)([[0-9]]*)")
[[1]]
[1] "tex" "2-01-2016"
It doesn't matter if I use perl=TRUE
. Result is consistent also if I use stringi::stri_split
, so it's a problem in my regex.
What is the correct regex to use in this case?
Upvotes: 2
Views: 70
Reputation: 627082
The "problem" here is that you have a regex for matching, not for splitting.
You can use the following PCRE regex with strsplit
:
strsplit(txt,split = "(?<=[a-zA-Z])(?=[0-9])", perl=T)
[[1]]
[1] "text" "12-01-2016"
The regex will match the location between a letter and a digit and strsplit will split the result. You can unlist it further on if you need.
If you want to use your regex, use str_match
from stringr
:
> library(stringr)
>str_match(txt, "([a-zA-Z]*)([0-9].*)")
[,1] [,2] [,3]
[1,] "text12-01-2016" "text" "12-01-2016"
Upvotes: 5