Anass
Anass

Reputation: 974

Extract all strings between two strings

I'm trying to develop a method that will match all strings between two strings:

I've tried this but it returns only the first match:

string ExtractString(string s, string start,string end)
        {
            // You should check for errors in real-world code, omitted for brevity

            int startIndex = s.IndexOf(start) + start.Length;
            int endIndex = s.IndexOf(end, startIndex);
            return s.Substring(startIndex, endIndex - startIndex);
        }

Let's suppose we have this string

String Text = "A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2"

I would like a c# function doing the following :

public List<string> ExtractFromString(String Text,String Start, String End)
{
    List<string> Matched = new List<string>();
    .
    .
    .
    return Matched; 
}
// Example of use 

ExtractFromString("A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2","A1","A2")

    // Will return :
    // FIRSTSTRING
    // SECONDSTRING
    // THIRDSTRING

Thank you for your help !

Upvotes: 14

Views: 44249

Answers (5)

PeteGO
PeteGO

Reputation: 5781

Here is a solution using RegEx. Don't forget to include the following using statement.

using System.Text.RegularExpressions

It will correctly return only text between the start and end strings given.

Will not be returned:

akslakhflkshdflhksdf

Will be returned:

FIRSTSTRING
SECONDSTRING
THIRDSTRING

It uses the regular expression pattern [start string].+?[end string]

The start and end strings are escaped in case they contain regular expression special characters.

    private static List<string> ExtractFromString(string source, string start, string end)
    {
        var results = new List<string>();

        string pattern = string.Format(
            "{0}({1}){2}", 
            Regex.Escape(start), 
            ".+?", 
             Regex.Escape(end));

        foreach (Match m in Regex.Matches(source, pattern))
        {
            results.Add(m.Groups[1].Value);
        }

        return results;
    }

You could make that into an extension method of String like this:

public static class StringExtensionMethods
{
    public static List<string> EverythingBetween(this string source, string start, string end)
    {
        var results = new List<string>();

        string pattern = string.Format(
            "{0}({1}){2}",
            Regex.Escape(start),
            ".+?",
             Regex.Escape(end));

        foreach (Match m in Regex.Matches(source, pattern))
        {
            results.Add(m.Groups[1].Value);
        }

        return results;
    }
}

Usage:

string source = "A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2";
string start = "A1";
string end = "A2";

List<string> results = source.EverythingBetween(start, end);

Upvotes: 17

Flavia Obreja
Flavia Obreja

Reputation: 1237

    private static List<string> ExtractFromBody(string body, string start, string end)
    {
        List<string> matched = new List<string>();

        int indexStart = 0;
        int indexEnd = 0;

        bool exit = false;
        while (!exit)
        {
            indexStart = body.IndexOf(start);

            if (indexStart != -1)
            {
                indexEnd = indexStart + body.Substring(indexStart).IndexOf(end);

                matched.Add(body.Substring(indexStart + start.Length, indexEnd - indexStart - start.Length));

                body = body.Substring(indexEnd + end.Length);
            }
            else
            {
                exit = true;
            }
        }

        return matched;
    }

Upvotes: 35

Macros
Macros

Reputation: 7119

You can split the string into an array using the start identifier in following code:

String str = "A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2";

String[] arr = str.Split("A1");

Then iterate through your array and remove the last 2 characters of each string (to remove the A2). You'll also need to discard the first array element as it will be empty assuming the string starts with A1.

Code is untested, currently on a mobile

Upvotes: 2

nawfal
nawfal

Reputation: 73163

This is a generic solution, and I believe more readable code. Not tested, so beware.

public static IEnumerable<IList<T>> SplitBy<T>(this IEnumerable<T> source, 
                                               Func<T, bool> startPredicate,
                                               Func<T, bool> endPredicate, 
                                               bool includeDelimiter)
{
    var l = new List<T>();
    foreach (var s in source)
    {
        if (startPredicate(s))
        {
            if (l.Any())
            {
                l = new List<T>();
            }
            l.Add(s);
        }
        else if (l.Any())
        {
            l.Add(s);
        }

        if (endPredicate(s))
        {
            if (includeDelimiter)
                yield return l;
            else
                yield return l.GetRange(1, l.Count - 2);

            l = new List<T>();
        }
    }
}

In your case you can call,

var text = "A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2";
var splits = text.SplitBy(x => x == "A1", x => x == "A2", false);

This is not the most efficient when you do not want the delimiter to be included (like your case) in result but efficient for opposite cases. To speed up your case one can directly call the GetEnumerator and make use of MoveNext.

Upvotes: 0

Zaid Masud
Zaid Masud

Reputation: 13443

text.Split(new[] {"A1", "A2"}, StringSplitOptions.RemoveEmptyEntries);

Upvotes: 4

Related Questions