CrunchyTaco
CrunchyTaco

Reputation: 41

Parsing a short string from html source

<br />
Your coupon for 50% off MSRP - Inline is: XXXXXXXXXXX<br />
Your coupon for 50% off MSRP - Outdoor is: XXXXXXXXXXX<br /><br />

I wish to parse out the coupon code. I current have is(.+?)<br> but its also including the <br> at the end.

Upvotes: 2

Views: 84

Answers (2)

Shar1er80
Shar1er80

Reputation: 9041

Try a lookbehind/lookahead pattern like this:

".*?coupon.*?(?<=: )(\\w+)(?=<br />|<br/>)"

It matches alphanumeric data, into capture group 1, that has the word "coupon" and is between the ": " and "<br />" or <br/>"

using System;
using System.Text.RegularExpressions;

public class Program
{
    public static void Main()
    {
        string html = "<br />\n" +
            "Your coupon for 50% off MSRP - Inline is: XXXXXXXXXXX<br />" +
            "Your coupon for 50% off MSRP - Outdoor is: XXXXXXXXXXX<br /><br />";

        MatchCollection matches = Regex.Matches(html, ".*?coupon.*?(?<=: )(\\w+)(?=<br />|<br/>)");
        foreach (Match match in matches)
        {
            Console.WriteLine(match.Groups[1]);
        }
    }
}

Results:

XXXXXXXXXXX
XXXXXXXXXXX

Fiddle Demo

Upvotes: 1

John Smith
John Smith

Reputation: 7407

You should be able to do this without even using Regex. Something like

string s = "Your coupon for 50% off MSRP - Outdoor is: XXXXXXXXXXX";
Console.WriteLine(s.Substring(s.LastIndexOf(' ') + 1));

should work as long as the coupon code is always the last part of the string, with a space prefixing it.

EDIT: one alternative after seeing your edit and that the strings are wrapped in <br>, you could always .Replace the match results with an empty string-

string s = "Your coupon for 50% off MSRP - Outdoor is: XXXXXXXXXXX<br>";
Console.WriteLine(s.Substring(s.LastIndexOf(' ') + 1).Replace("<br>","")); 

Upvotes: 0

Related Questions