TheEdge
TheEdge

Reputation: 9861

An elegant way in C# to separate a comma separated list of email addresses

Looking on SO there are various approaches to this problem, however the recommended solution for instance does not deal with \"Last, First\" " and the suggestion posted by richard in that post is missing the code to SetUpTextFieldParser()

I have the following list of email addresses as a string:

string str = "Last, First <[email protected]>, [email protected], First Last <[email protected]>, \"First Last\" <[email protected]>, \"Last, First\" <[email protected]>";

Current code does a:

str.Split(",");

which produces an incorrect list because of the comma in:

"Last, First"

Anyone got something elegant here to share so that I end up with an array of strings in the form:

Last, First <[email protected]>
[email protected]
First Last <[email protected]>
"First Last" <[email protected]>
"Last, First" <[email protected]>

EDIT - SOLUTION

I ended up using Yacoub Massad's solution as it was simple (regular expressions would be hard to maintain in my dev group as not everyone understands them). Below is the code (Fiddle) with some additions and simplistic testing to make sure all was well:

_

using System;
using System.Collections.Generic;
using System.Net.Mail;

public class Program
{
    public static void Main()
    {
        //https://msdn.microsoft.com/en-us/library/system.net.mail.mailaddress(v=vs.110).aspx
        //Some esoteric "comment" formats as well as a trailing comma in case someone did not tidy up
        string emails = "Last, First <[email protected]>, [email protected], First Last <[email protected]>, \"First Last\" <[email protected]>, \"Last, First\" <[email protected]>,  (comment)\"First, Last\"(comment)<(comment)joe(comment)@(comment)there.com(comment)>(comment),";
        List<string> result = new List<string>();

        Console.WriteLine("LOOP");
        while (true)
        {
            int position_of_at = emails.IndexOf("@");
            if (position_of_at == -1)
            {
                break;
            }

            int position_of_comma = emails.IndexOf(",", position_of_at);
            if (position_of_comma == -1)
            {
                result.Add(emails);
                break;
            }

            string email = emails.Substring(0, position_of_comma);
            result.Add(email);
            emails = emails.Substring(position_of_comma + 1);
        }
        Console.WriteLine("/LOOP");

        //Do some very basic validation of above code
        var i = 1;
        if (result.Count == 6)
            Console.WriteLine("SUCCESS: " + result.Count);
        else
            Console.WriteLine("FAILURE: " + result.Count);
        foreach (string emailAddress in result)
        {
            Console.WriteLine("==== " + i.ToString());
            Console.WriteLine(emailAddress);
            Console.WriteLine("/====");
            MailAddress mailAddress = new MailAddress(emailAddress);
            Console.WriteLine(mailAddress.DisplayName);
            Console.WriteLine("---- " + i.ToString());
            i++;
        }
    }
}

Upvotes: 5

Views: 4541

Answers (7)

ohavryl
ohavryl

Reputation: 407

Try

UserEmails?.Split(';',',',' ','\n','\t').Where(x => !string.IsNullOrWhiteSpace(x)).ToList();

Upvotes: 0

pbz
pbz

Reputation: 9095

Here's a version that handles a few more edge cases and has fewer allocations:

public static List<string> ExtractEmailAddresses(string text)
{
    var items = new List<string>();

    if (String.IsNullOrEmpty(text))
    {
        return items;
    }

    int start = 0;
    bool foundAt = false;
    int comment = 0;

    for (int i = start; i < text.Length; i++)
    {
        switch (text[i])
        {
            case '@':
                if (comment == 0) { foundAt = true; }
                break;
            case '(':
                comment++;
                break;
            case ')':
                comment--;
                break;
            case ',':
                HandleLastBlock(i);
                break;
        }
    }

    HandleLastBlock(text.Length);

    return items;

    void HandleLastBlock(int end)
    {
        if (comment == 0 && foundAt && start < end - 1)
        {
            var email = new System.Net.Mail.MailAddress(text.Substring(start, end - start));
            items.Add(email.Address);
            start = end + 1;
            foundAt = false;
        }
    }
}

Upvotes: 0

w.b
w.b

Reputation: 11228

You can use Regex.Split with @"(?<=@\S*)\s+ - it splits on a space (or spaces) preceded by a word containing @:

string str = "Last, First <[email protected]>, [email protected], First Last <[email protected]>,  \"First Last\" <[email protected]>, \"Last, First\" <[email protected]>";

string[] arr = Regex.Split(str, @"(?<=@\S*)\s+");

foreach (var s in arr)
    Console.WriteLine(s);

output:

Last, First <[email protected]>,
[email protected],
First Last <[email protected]>,
"First Last" <[email protected]>,
"Last, First" <[email protected]>

Upvotes: 0

amit dayama
amit dayama

Reputation: 3326

shortest method would be:

        string str = "Last, First <[email protected]>, [email protected], First Last <[email protected]>, \"First Last\" <[email protected]>, \"Last, First\" <[email protected]>";
        string[] separators = new string[] { "com>,","com,","com>","com"};
    var outputEmail = str.Split(separators,StringSplitOptions.RemoveEmptyEntries).Where(s=>s.Contains("@")).Select(s=>{return s.Contains('<') ? (s+"com>").Trim() : (s+"com").Trim();});
        foreach (var email in outputEmail)
        {
            MessageBox.Show(email);
        }

Upvotes: 0

Matias Cicero
Matias Cicero

Reputation: 26301

Here is a nice and elegant short method that will do what you ask using a regular expression:

private IEnumerable<string> GetEmails(string input)
{
    if (String.IsNullOrWhiteSpace(input)) yield break;
    MatchCollection matches = Regex.Matches(input, @"[^\s<]+@[^\s,>]+");
    foreach (Match match in matches) yield return match.Value;
}

You would call it like this:

string str = "Last, First <[email protected]>, [email protected], First Last <[email protected]>, \"First Last\" <[email protected]>, \"Last, First\" <[email protected]>";
IEnumerable<string> emails = GetEmails(str);

Please note that this regular expression does not validate the email addresses, for instance, the email 1@h will be considered valid and you will get it as a match.

Creating such a regex validator would be a difficult job and probably not the best option.

For retrieving purposes, I think it is the ideal tool.

Upvotes: 1

Yan Paulo
Yan Paulo

Reputation: 459

Not exactly elegant, but try this:

        private static IEnumerable<string> GetEntries(string str)
        {
            List<string> entries = new List<string>();
            StringBuilder entry = new StringBuilder();
            while (str.Length > 0)
            {
                char ch = str[0];
                //If the first character on the string is a comma, and the entry already contains na '@'
                //Add this entry to the entries list and clear the temporary entry item.
                if (ch == ',' && entry.ToString().Contains("@"))
                {
                    entries.Add(entry.ToString());
                    entry.Clear();
                }
                //Just add the chacacter to the temporary entry item, otherwise.
                else
                {
                    entry.Append(ch);
                }
                str = str.Remove(0, 1);
            }
            //Add the last entry, which is still in the buffer because it doesn't end with a ',' character.
            entries.Add(entry.ToString());
            return entries;
        }

It will Split entries by comma, but only those entries which contains an '@' character before the ',' character.

You would call it like this:

string str = "Last, First <[email protected]>, [email protected], First Last <[email protected]>, \"First Last\" <[email protected]>, \"Last, First\" <[email protected]>";
var entries = GetEntries(str);

Upvotes: 0

Yacoub Massad
Yacoub Massad

Reputation: 27871

Try this:

public List<string> ExtractEmails(string emails)
{
    List<string> result = new List<string>();

    while (true)
    {
        int position_of_at = emails.IndexOf("@");

        if (position_of_at == -1)
        {
            break;
        }

        int position_of_comma = emails.IndexOf(",", position_of_at);

        if (position_of_comma == -1)
        {
            result.Add(emails);
            break;
        }

        string email = emails.Substring(0, position_of_comma);

        result.Add(email);

        emails = emails.Substring(position_of_comma + 1);

    }

    return result;
}

It assumes that all emails are going to contain the @ character.

It works by considering only the commas that appear after the @ character as splitting commas, other commas are considered part of the email.

Upvotes: 4

Related Questions