user8511320
user8511320

Reputation:

Locate RegEx match then extract

I am trying to read text from a RichTextBox in order to locate the first occurrence of a matched expression. I would then like to extract the string that satisfies they query so I can use it as a variable. Below is the basic bit of code I have to start of with and build upon.

private string returnPostcode()
{
        string[] allLines = rtxtDocViewer.Text.Split('\n');
        string expression =  string expression = "^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([AZa-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z])))) [0-9][A-Za-z]{2})$"            

        foreach (string line in allLines)
        {
            if (Regex.Matches(line, expression, RegexOptions.Count > 0)
            {                    
                //extract and return the string that is found
            }
        }
  }     

Example of what's contained in the RichTextBox is below. I want to extract "E12 8SD" which the above regex should be able to find. Thanks

Damon Brown
Flat B University Place
26 Park Square 
London
E12 8SD
Mobile: 1111 22222
Email: [email protected]  Date of birth: 21/03/1986
Gender: Male
Marital Status: Single
Nationality: English
Summary
I have acquired a multifaceted skill set with experience using several computing platforms.

Upvotes: 2

Views: 223

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627087

You need to use Regex.IsMatch and remove the RegexOptions.Count > 0

string[] allLines = s.Split('\n');
string expression = "^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([AZa-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z])))) [0-9][A-Za-z]{2})$";

foreach (string line in allLines)
{
    if (Regex.IsMatch(line, expression)) // Regex.IsMatch will check if a string matches the regex
    {                     
        Console.WriteLine(line);         // Print the matched line
    }
}

See the IDEONE Demo

Quite possible that your text contains CR+LF line breaks. Then, adjust your code as follows:

string[] allLines = s.Split(new[] {"\r\n"}, StringSplitOptions.RemoveEmptyEntries);

See this demo

UPDATE

To just extract the code with your regex, you need not split the contents into lines, just use a Regex.Match on the whole text:

string s = "Damon Brown\nFlat B University Place\n26 Park Square \nLondon\nTW1 1AJ Twickenham    Mobile: +44 (0) 7711223344\nMobile: 1111 22222\nEmail: [email protected]    Date of birth: 21/03/1986\nGender: Male\nMarital Status: Single\nNationality: English\nSummary\nI have acquired a multifaceted skill set with experience using several computing platforms."; 
string expression = @"(?i)\b(gir 0a{2})|((([a-z][0-9]{1,2})|(([a-z][a-hj-y][0-9]{1,2})|(([a-z][0-9][a-z])|([a-z][a-hj-y][0-9]?[a-z])))) [0-9][a-z]{2})\b";
Match res = Regex.Match(s, expression);
if (res.Success)
    Console.WriteLine(res.Value); // = > TW1 1AJ

I also removed the uppercase ranges to replace them with a case-insensitive modifier (?i).

See this IDEONE demo.

Upvotes: 1

Related Questions