Albert Gao
Albert Gao

Reputation: 3773

How to really split string into string arrays without losing its part in C#?

What I have

string ImageRegPattern = @"http://[\w\.\/]*\.jpg|http://[\w\.\/]*\.png|http://[\w\.\/]*\.gif";
string a ="http://www.dsa.com/asd/jpg/good.jpgThis is a good dayhttp://www.a.com/b.pngWe are the Best friendshttp://www.c.com";

What I want

string[] s;
s[0] = "http://www.dsa.com/asd/jpg/good.jpg";
s[1] = "This is a good day";
s[2] = "http://www.a.com/b.png";
s[3] = "We are the Best friendshttp://www.c.com";

Bouns:
if the url can be splited like below, it will be better, but if not, that's ok.

s[3] = "We are the Best friends";
s[4] = "http://www.c.com";

What's the question
I try to use the code below to split the string,

string[] s= Regex.Split(sourceString, ImageRegPattern, RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);

But the result is not good, it seems that the Split method take out all the strings which matched the ImageRegPattern. But I want them to stay. I check the RegEx page on MSDN ,it seems there is no proper method to meet my need. So how to do it?

Upvotes: 4

Views: 1257

Answers (4)

FishBasketGordo
FishBasketGordo

Reputation: 23122

You need something like this method, which finds all the matches first, and then collects them into a list along with the unmatched strings between them.

UPDATE: Added conditional to handle if no matches are found.

private static IEnumerable<string> InclusiveSplit
(
    string source, 
    string pattern
)
{
  List<string> parts = new List<string>();
  int currIndex = 0;

  // First, find all the matches. These are your separators.
  MatchCollection matches = 
      Regex.Matches(source, pattern, 
      RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);

  // If there are no matches, there's nothing to split, so just return a
  // collection with just the source string in it.
  if (matches.Count < 1)
  {
    parts.Add(source);
  }
  else
  {
    foreach (Match match in matches)
    {
      // If the match begins after our current index, we need to add the
      // portion of the source string between the last match and the 
      // current match.
      if (match.Index > currIndex)
      {
        parts.Add(source.Substring(currIndex, match.Index - currIndex));
      }

      // Add the matched value, of course, to make the split inclusive.
      parts.Add(match.Value);

      // Update the current index so we know if the next match has an
      // unmatched substring before it.
      currIndex = match.Index + match.Length;
    }

    // Finally, check is there is a bit of unmatched string at the end of the 
    // source string.
    if (currIndex < source.Length)
      parts.Add(source.Substring(currIndex));
  }

  return parts;
}

The output for your example input will be like so:

[0] "http://www.dsa.com/asd/jpg/good.jpg"
[1] "This is a good day"
[2] "http://www.a.com/b.png"
[3] "We are the Best friendshttp://www.c.com"

Upvotes: 4

HamZa
HamZa

Reputation: 14921

One does not simply underestimate the power of :

(.*?)([A-Z][\w\s]+(?=http|$))

Explanation:

  • (.*?) : group and match everything until capital letter found, in this group you'll find the url
  • ( : start group
    • [A-Z] : match one capital letter
    • [\w\s]+ : match any character of a-z, A-Z, 0-9, _, \n, \r, \t, \f " " 1 or more times
    • (?=http|$) : lookahead, check if what follows is http or end of line
    • ) : close group (here you'll find the text)

Online demo

Note: This solution is for matching the string, not splitting it.

Upvotes: 1

melwil
melwil

Reputation: 2553

The obvious answer here is of course not to use split, but rather matching the image patterns and retrieving them. That being said, it's not impossible to use split.

string ImageRegPattern = @"(?=(http://[\w./]*?\.jpg|http://[\w./]*?\.png|http://[\w./]*?\.gif))|(?<=(\.jpg|\.png|\.gif))"

This will match any point in the string that is either followed by an image url, or a point that is preceeded by .jpg, .gif or .png.

I really don't recommend doing it this way, I'm just saying you can.

Upvotes: 0

Dave Michener
Dave Michener

Reputation: 1098

I think you need a multi-step process to insert a delimiter that can then be used by the String.Split command:

resultString = Regex.Replace(rawString, @"(http://.*?/\w+\.(jpg|png|gif))", "|$1|", RegexOptions.IgnoreCase);
if (a.StartsWith("|")
   a = a.Substring(1);
string a = resultString.Split('|');

Upvotes: 0

Related Questions