Reputation: 93
In Dart, I would like to split a string using a regular expression and include the matching delimiters in the resulting list. So with the delimiter .
, I want the string 123.456.789
to get split into [ 123, ., 456, ., 789 ]
.
In some languages, like C#, JavaScript, Python and Perl, according to https://stackoverflow.com/a/15668433, this can be done by simply including the delimiters in capturing parentheses. The behaviour seems to be documented at https://ecma-international.org/ecma-262/9.0/#sec-regexp.prototype-@@split.
This doesn't seem to work in Dart, however:
print("123.456.789".split(new RegExp(r"(\.)")));
yields exactly the same thing as without the parentheses. Is there a way to get split()
to work like this in Dart? Otherwise I guess it will have to be an allMatches()
implementation.
Edit: Putting ((?<=\.)|(?=\.))
for the regex apparently does the job for a single delimiter, with lookbehind and lookahead. I will actually have a bunch of delimiters, and I'm not sure about efficiency with this method. Can someone advise if it's fine? Legibility is certainly reduced: to allow delimiters .
and ;
, would one need
((?<=\.)|(?=\.)|(?<=;)(?=;))
or
((?<=\.|;)|(?=\.|;)
.
Testing
print("123.456.789;abc;.xyz.;ABC".split(new RegExp(r"((?<=\.|;)|(?=\.|;))")));
indicates that both work.
Upvotes: 9
Views: 4044
Reputation: 8720
There is no direct support for it in the standard library, but it is fairly straightforward to roll your own implementation based on RegExp.allMatches()
. For example:
extension RegExpExtension on RegExp {
List<String> allMatchesWithSep(String input, [int start = 0]) {
var result = <String>[];
for (var match in allMatches(input, start)) {
result.add(input.substring(start, match.start));
result.add(match[0]!);
start = match.end;
}
result.add(input.substring(start));
return result;
}
}
extension StringExtension on String {
List<String> splitWithDelim(RegExp pattern) =>
pattern.allMatchesWithSep(this);
}
void main() {
print("123.456.789".splitWithDelim(RegExp(r"\.")));
print(RegExp(r" ").allMatchesWithSep("lorem ipsum dolor sit amet"));
}
Upvotes: 11
Reputation: 22817
Given your initial string:
123.456.789
And expected results (split on and including delimiters):
[123, ., 456, ., 789]
You can come up with the following regex:
(?!^|$)\b
Matches locations that match a word boundary, except for the start/end of the line.
Now for your edit, given the following string:
123.456.789;abc;.xyz.;ABC
You'd like the expected results (split on and including multiple delimiters):
[123, ., 456, ., 789, ;, abc, ;, ., xyz, ., ;, ABC]
You can use the following regex (adapted from first - added alternation):
See regex sample here (I simulate split by using substitution with newline character for display purposes).
Either of the following work.
(?!^|$)\b|(?!\w)\B(?!\w)
(?!^|$)\b|(?=\W)\B(?=\W)
# the long way (with case-insensitive matching) - allows underscore _ as delimiter
(?!^|$)(?:(?<=[a-z\d])(?![a-z\d])|(?<![a-z\d])(?=[a-z\d])|(?<![a-z\d])(?![a-z\d]))
Matches locations that match a word boundary, except for the start/end of the line; or matches a location that doesn't match a word boundary, but is preceded by or followed by a non-word character.
Note: This will work in Dart 2.3.0 and up since lookbehind support was added (see here for more info).
Upvotes: 1