D-W
D-W

Reputation: 5341

How to extract email from html link

Hi I have a csv file which I need to format (columns) email, they are in the csv as follows

<a href=\mailto:[email protected]\">[email protected]</a>"
<a href=\mailto:[email protected]\">[email protected]</a>"

etc...

So i want to remove <a href=\mailto:[email protected]\"> </a>" and just use [email protected]

I have the following

foreach (var clientI in clientImportList)
                            {
newClient = new DomainObjects.Client();
//Remove unwanted email text??
                                newClient.Email = clientI.Email
                            }

Upvotes: 2

Views: 1490

Answers (4)

Yosi Dahari
Yosi Dahari

Reputation: 7009

I would suggest to use HtmlAgilityPack and not parse it yourself:

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);

foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
{
    string href = link["href"].Value;
    // use "mailto:[email protected]" here..
}

Upvotes: 3

thewisegod
thewisegod

Reputation: 1542

I usually write myself little utility classes and extensions to handle things like this. Since this probably won't be the last time you have to do something like this you could do this:

Create an Extension of the string class:

public static class StringExtensions
{
    public static string ExtractMiddle(this string text, string front, string back)
    {
        text = text.Substring(text.IndexOf(front) + 1);
        return text.Remove(text.IndexOf(back));
    }
}

And then do this(Could use better naming, but you get the point):

string emailAddress = text.ExtractMiddle(">", "<");

Upvotes: -1

Shawn Darichuk
Shawn Darichuk

Reputation: 390

You can test regular expressions here: https://regex101.com/

Using your example, this seems to work:

mailto:(.*?)\\">

The library needed for regex is:

using System.Text.RegularExpressions;

Upvotes: 0

EAnders
EAnders

Reputation: 112

If you want to do it the index way, something like:

        const string start = "<a href=\\mailto:";
        const string end = "\\\">";
        string asd1 = "<a href=\\mailto:[email protected]\\\">[email protected]</a>\"";
        int index1 = asd1.IndexOf(start);
        int startPosition = index1 + start.Length;
        int endPosition = asd1.IndexOf(end);
        string email = asd1.Substring(startPosition, endPosition - startPosition);

Upvotes: -1

Related Questions