splzkai
splzkai

Reputation: 5

C# Regex remove href

So basically I want to remove href ie.:

<td class="name"><a href="/leagues/euw/633">Apdo Dog2</a></td>

So I just want the Apdo Dog2 part of the text. Any idea?

Here's my code:

private void button1_Click(object sender, EventArgs e)
{
    HttpWebRequest request = (HttpWebRequest)WebRequest.Create("SITE");
    HttpWebResponse response = (HttpWebResponse)request.GetResponse();

    StreamReader stream = new StreamReader(response.GetResponseStream());

    string final_response = stream.ReadToEnd();

    Regex r = new Regex(@"\<[^\>]+\>(.[^\<]+)</[^\>]+\>", RegexOptions.Singleline);
    Match m = r.Match(final_response);

    richTextBox1.Text = m.Value;

Upvotes: 0

Views: 1354

Answers (3)

Noctis
Noctis

Reputation: 11763

You can have a look at my answer here and do something similar I guess.

The only difference is that you'll delete/remove all the stuff including the brackets.

Other than that, keep in mind that the general consensus is to not mix RexEx with HTML :)

This should work :

void Main()
{
    // your input
    String input = @@"<td class=""name""><a href=""/leagues/euw/633"">Apdo Dog2</a></td>";
    // temp variables
    StringBuilder sb = new StringBuilder();
    bool inside = false;
    bool delete = false;
    // analyze string
    for (int i = 0; i < input.Length; i++)
    {
        // Special case, start bracket
        if (input[i].Equals('<')) { 
            inside = true;

        }
        // special case, close bracket
        else if (input[i].Equals('>')) {
            inside = false;
            continue;
        }

        // add if needed
        if (!inside)
                sb.Append(input[i]);
    }
    var result = sb.ToString(); // -> holds: "Apdo Dog2"
}

Upvotes: -1

hwnd
hwnd

Reputation: 70732

Any Ideas? Yes, you should use a parser such as HtmlAgilityPack to extract these values.

You don't need to escape the bracket characters, these characters have no special meaning. But, the main problem is the dot . remove it. And then refer to the group using Match.Groups Property to access your match result.

Regex r = new Regex(@"<[^>]+>([^<]+)</[^>]+>");
Match m = r.Match(final_response);
richTextBox1.Text = m.Groups[1].Value;

Note: The dot . is removed and you are using negated classes, so you can remove the dotall modifier.

Working Demo

Upvotes: 2

Federico Piazza
Federico Piazza

Reputation: 30995

You can use this regex:

<a.*?>(.*?)<\/a>

Working demo

enter image description here

Upvotes: 2

Related Questions