Reputation: 43
I'm trying to find a way to split a character column with an ellipsis in the middle into two columns, everything before the ellipsis and everything after.
For example, if I have:
a <- "60.4 (b)(33) and (e)(1) revised....................................46111"
How do I split that into "60.4 (b)(33) and (e)(1) revised" and "46111"?
I have tried:
str_extract(a, ".*\\.{2,}")
for the first part, and for the second part:
str_extract(a, "\\.{2,}.*")
but that keeps the ellipsis in both, which I'd like to drop.
Upvotes: 4
Views: 714
Reputation: 627292
It seems you want to split, not to extract, with a pattern that matches two or more consecutive dots:
a <- "60.4 (b)(33) and (e)(1) revised....................................46111"
unlist(stringr::str_split(a, "\\.{2,}"))
## => [1] "60.4 (b)(33) and (e)(1) revised" "46111"
## Base R strsplit:
unlist(strsplit(a, "\\.{2,}"))
## => [1] "60.4 (b)(33) and (e)(1) revised" "46111"
There is another possible splitting regex here: you can match any one or more dots that are followed with a some one or more digits at the end of string:
unlist(stringr::str_split(a, "\\.+(?=\\d+$)"))
unlist(strsplit(a, "\\.+(?=\\d+$)", perl=TRUE))
Both yield the same [1] "60.4 (b)(33) and (e)(1) revised" "46111"
output. Here, \.+
matches one or more dots and (?=\d+$)
is a positive lookahead that matches a location that is immediately followed with one or more digits (\d+
) and then end of string ($
).
Another approach is a matching one with str_match
(to capture the bits you need):
res <- stringr::str_match(a, "^(.*?)\\.+(\\d+)$")
res[,-1]
# => [1] "60.4 (b)(33) and (e)(1) revised" "46111"
Here,
^
- matches the start of string(.*?)
- Group 1: any zero or more chars other than line break chars, as few as possible\.+
- one or more dots(\d+)
- Group 2: one or more digits$
- end of string.The res[,-1]
is necessary to remove the first column with the full matches.
Upvotes: 4