user8037953
user8037953

Reputation:

Search in a string and split the original string by the found content

I'm wondering if there is a good way to find content, and also split the results by the found content, for example if I have string:

string str = "you androids don't exactly cover for each other in times of stress. 
i think you're right it would seem we lack a specific talent you humans possess
i believe it's called empathy"; 

and the search strings, for example:

var sList = new List {"for each other",  "talent", "you humans"};

The result with the found strings separated by spitting of the original string would be:

you androids don't exactly cover 
for each other 
in times of stress. i think you're right it would seem we lack a specific 
talent 
you humans  
possess i believe it's called empathy

In case the same string is in two different search strings (here it you):

var sList = new List {"for each other", "other in", "talent", "you humans", "you"};

The correct output should be this:

you 
androids don't exactly cover 
for each other
other in
times of stress. i think you're right it would seem we lack a specific 
talent 
you
you humans  
possess i believe it's called empathy

Upvotes: 1

Views: 99

Answers (2)

LB2
LB2

Reputation: 4860

You can use regular expressions to match a set of strings within a string, and then you need to account for gaps in between, adjusting for overlapping matched ranges:

using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
using System.Linq;

public class Program
{
    public static void Main()
    {
        string str = "you androids don't exactly cover for each other in times of stress. i think youre right it would seem we lack a specific talent you humans possess i believe it's called empathy"; 
        var sList = new List<string> {"for each other", "other in", "talent", "you humans", "you"};
        var chRangeMap = new bool[str.Length];
        for (var i = 0; i < chRangeMap.Length; ++i) chRangeMap[i] = false;

        var matchedTokenMap = sList
            .Select(i => "\\b" + Regex.Escape(i) + "\\b")
            .SelectMany(p => (new Regex(p)).Matches(str).OfType<Match>())
            .Cast<Match>()
            .Select(m => new 
                    { 
                        StartIndex = m.Index,
                        EndIndex = m.Index + m.Length,
                        Length = m.Length
                    })
            .Select(r => {
                for (var i = r.StartIndex; i < r.EndIndex; ++i) chRangeMap[i] = true;
                return r;
                });

        var fullTokenized = 
            matchedTokenMap.Concat(
                GetArrayRanges(chRangeMap, false)
                    .Select(r => new 
                            { 
                                StartIndex = r.Item1,
                                EndIndex = r.Item2,
                                Length = r.Item2 - r.Item1
                            })
            )
            .OrderBy(k => k.StartIndex).ThenBy(sk => sk.Length);

        foreach(var token in fullTokenized)
        {
            WriteTrimmed(str.Substring(token.StartIndex, token.Length));
        }
    }

    private static void WriteTrimmed(string str)
    {
        str = str.Trim();
        if (!string.IsNullOrWhiteSpace(str))
        {
            Console.WriteLine(str);
        }
    }

    private static IEnumerable<Tuple<int, int>> GetArrayRanges(bool[] array, bool seekValue)
    {
        int? rangeStart = null;

        for(var i = 0; i < array.Length; ++i)
        {
            if (array[i] == seekValue)
            {
                if (!rangeStart.HasValue)
                {
                    rangeStart = i;
                }
            }
            else
            {
                if (rangeStart.HasValue)
                {
                    yield return Tuple.Create(rangeStart.Value, i);
                    rangeStart = null;
                }
            }
        }

        if (rangeStart.HasValue)
        {
            yield return Tuple.Create(rangeStart.Value, array.Length);
        }
    }
}

DotNETFiddle of the code.

Upvotes: 0

degant
degant

Reputation: 4981

Try this:

List<string> parts = new List<string> { str };
sList.ForEach(seperator => parts = parts
    .SelectMany(part => Regex.Match(part, "(.*) ?(\\b" + seperator + "\\b) ?(.*)|(.+)")
        .Groups
        .Cast<Group>()
        .Where(group => group.Success)
        .Select(group => group.Value)
        .Skip(1))
    .ToList());

parts = parts
    .Where(x => !string.IsNullOrWhiteSpace(x))
    .ToList();

Output:

you
androids don't exactly cover 
for each other
in times of stress. i think youre right it would seem we lack a specific 
talent
you
humans
possess i believe it's called empathy

Dotnet Fiddle Demo

Upvotes: 1

Related Questions