vijesh
vijesh

Reputation: 1173

Read string to a formatted way using regex and c#

I am capturing mainframe screen using c# and I have to read the labels corresponds to for the text entering region from the screen. Currently Iam reading it from the captured image using tesseract ocr plugin, It returns a string, I want split that string according to some characters in it. The characters are the following.

{ '@', '<', '>', '=', '$', '%', '&' }

and for splitting a sample string is shown below

first name => saran    address @> my address

Any way to split this string using regex as the following format to an array

[0]: "first name"
[1]: "=> saran" 
[2]: "address" 
[3]: "@> my address"

Upvotes: 0

Views: 158

Answers (1)

Enigmativity
Enigmativity

Reputation: 117064

This gets you very close (but not using Regex):

char[] splitters = new[] { '@', '<', '>', '=', '$', '%', '&' };

string text = "first name => saran    address @> my address";

string[] results =
    text
        .Aggregate(new List<List<char>>() { new List<char>() }, (a, c) =>
        {
            var l = a.Last();
            if (splitters.Contains(c) && !l.All(x => splitters.Contains(x)))
            {
                l = new List<char>() { c };
                a.Add(l);
            }
            else
            {
                l.Add(c);
            }
            return a;
        })
        .Select(x => new string(x.ToArray()))
        .ToArray();

There's just nothing in your description as to how to split "saran address". Other than that this is tested and produces this:

first name  
=> saran    address  
@> my address 

Upvotes: 1

Related Questions