Reputation: 3773
What I have
string ImageRegPattern = @"http://[\w\.\/]*\.jpg|http://[\w\.\/]*\.png|http://[\w\.\/]*\.gif";
string a ="http://www.dsa.com/asd/jpg/good.jpgThis is a good dayhttp://www.a.com/b.pngWe are the Best friendshttp://www.c.com";
What I want
string[] s;
s[0] = "http://www.dsa.com/asd/jpg/good.jpg";
s[1] = "This is a good day";
s[2] = "http://www.a.com/b.png";
s[3] = "We are the Best friendshttp://www.c.com";
Bouns:
if the url can be splited like below, it will be better, but if not, that's ok.
s[3] = "We are the Best friends";
s[4] = "http://www.c.com";
What's the question
I try to use the code below to split the string,
string[] s= Regex.Split(sourceString, ImageRegPattern, RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
But the result is not good, it seems that the Split method take out all the strings which matched the ImageRegPattern. But I want them to stay. I check the RegEx page on MSDN ,it seems there is no proper method to meet my need. So how to do it?
Upvotes: 4
Views: 1257
Reputation: 23122
You need something like this method, which finds all the matches first, and then collects them into a list along with the unmatched strings between them.
UPDATE: Added conditional to handle if no matches are found.
private static IEnumerable<string> InclusiveSplit
(
string source,
string pattern
)
{
List<string> parts = new List<string>();
int currIndex = 0;
// First, find all the matches. These are your separators.
MatchCollection matches =
Regex.Matches(source, pattern,
RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
// If there are no matches, there's nothing to split, so just return a
// collection with just the source string in it.
if (matches.Count < 1)
{
parts.Add(source);
}
else
{
foreach (Match match in matches)
{
// If the match begins after our current index, we need to add the
// portion of the source string between the last match and the
// current match.
if (match.Index > currIndex)
{
parts.Add(source.Substring(currIndex, match.Index - currIndex));
}
// Add the matched value, of course, to make the split inclusive.
parts.Add(match.Value);
// Update the current index so we know if the next match has an
// unmatched substring before it.
currIndex = match.Index + match.Length;
}
// Finally, check is there is a bit of unmatched string at the end of the
// source string.
if (currIndex < source.Length)
parts.Add(source.Substring(currIndex));
}
return parts;
}
The output for your example input will be like so:
[0] "http://www.dsa.com/asd/jpg/good.jpg"
[1] "This is a good day"
[2] "http://www.a.com/b.png"
[3] "We are the Best friendshttp://www.c.com"
Upvotes: 4
Reputation: 14921
One does not simply underestimate the power of regex:
(.*?)([A-Z][\w\s]+(?=http|$))
Explanation:
(.*?)
: group and match everything until capital letter found, in this group you'll find the url(
: start group
[A-Z]
: match one capital letter[\w\s]+
: match any character of a-z, A-Z, 0-9, _, \n, \r, \t, \f " " 1 or more times(?=http|$)
: lookahead, check if what follows is http
or end of line)
: close group (here you'll find the text)Note: This solution is for matching the string, not splitting it.
Upvotes: 1
Reputation: 2553
The obvious answer here is of course not to use split, but rather matching the image patterns and retrieving them. That being said, it's not impossible to use split.
string ImageRegPattern = @"(?=(http://[\w./]*?\.jpg|http://[\w./]*?\.png|http://[\w./]*?\.gif))|(?<=(\.jpg|\.png|\.gif))"
This will match any point in the string that is either followed by an image url, or a point that is preceeded by .jpg
, .gif
or .png
.
I really don't recommend doing it this way, I'm just saying you can.
Upvotes: 0
Reputation: 1098
I think you need a multi-step process to insert a delimiter that can then be used by the String.Split
command:
resultString = Regex.Replace(rawString, @"(http://.*?/\w+\.(jpg|png|gif))", "|$1|", RegexOptions.IgnoreCase);
if (a.StartsWith("|")
a = a.Substring(1);
string a = resultString.Split('|');
Upvotes: 0