Reputation: 20223
I have a regex which will split my string into arrays.
Everyything works fine except that I would like to keep a part of the delimiter.
Here is my regex:
(&#?[a-zA-Z0-9]+;)[\s]
in Javascript, I am doing:
var test = paragraph.split(/(&#?[a-zA-Z0-9]+;)[\s]/g);
My paragraph is as followed:
Current addresses: † Biopharmaceutical Research and Development<br />
‡ Clovis Oncology<br />
§ Pisces Molecular <br />
|| School of Biological Sciences
¶ Department of Chemistry<br />
The problem is that I am getting 10 elements in my array and not 5 as I should. In fact, I am also getting my delimiter as an element and my goal is to keep the delimiter with the splited element and not to create a new one.
Thank you very much for your help.
EDIT:
I would like to get this as a result:
1. † Biopharmaceutical Research and Development<br />
2. ‡ Clovis Oncology<br />
3. § § Pisces Molecular <br />
|| School of Biological Sciences
4. ¶ Department of Chemistry<br />
Upvotes: 2
Views: 3834
Reputation: 49582
Try to use match
instead:
var test = paragraph.match(/&#?[a-zA-Z0-9]+;\s[^&]*/g);
Updated: Added a required white-space \s
match.
Explanation:
&#?
Match &
and an optional #
(the question mark match previous one or zero times)
[a-zA-Z0-9]
is a range of all upper and lower case characters and digits. If you also accept an underscore you could replace this with \w
.
The +
sign means that it should match the last pattern one or more times, so it matches one or more characters a-z, A-Z and digits 0-9.
The ;
matches the character ;
.
The \s
matches the class white-space. That includes space, tab and other white-space characters.
[^&]*
Once again a range, but since ^
is the first character the match is negated, so instead of matching the &
-characters it matches everything but the &
. The star matches the pattern zero or more times.
g
at the end, after the last /
means global
, and makes the match
continue after the first match and get an array of all matches.
So, match &
and an optional #
, followed by any number of letters or digits (but at least one), followed by ;
, followed by a white-space, followed by zero or more characters that isn't &
.
Upvotes: 1
Reputation: 43673
Using regex it is pretty simple:
var result = input.match(/&#?[^\W_]+;\s[^&]*/g);
Upvotes: 1
Reputation: 53291
As I said in the comment, this solution (untested, by the way) will only work if you're just managing <br />
elements. Here:
var text = paragraph.split("<br />"); // now text contains just the text on each line
for(var i = 0; i<text.length-1; i++) { // don't want to add an line break to our last line
text[i] += " <br />"; // replace the <br /> elements on each line
}
The variable text
is now an array, where each element of the array is a line of the original paragraph. The linebreaks (<br />
) have been added back on the end of each line. You just mentioned that you want to split on the special characters, but from what I see, each line ends in a line break, so this should hopefully have the same effect. Unfortunately I don't have the time to write up a more complete answer at the moment.
Upvotes: 1