Squivo
Squivo

Reputation: 23

Split substrings within a string in C#

I need to check a string that contains a list of e-mails. These emails are usually separated by commas, but I need to check if somewhere in that list there is a delimiter other than a comma. Here's an example:

[email protected],[email protected],[email protected]#[email protected]

I need to identify that different character and replace to a comma.

I cannot just use a regex to identify special characters other than the comma and replace them because emails may have some of these characters. So I need to find something between two e-mail. I made the following regex to identify an e-mail and I believe it will cover most of the emails:

^[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@[a-z0-9]+(\.[a-z0-9]+)+$

But I'm a little lost on how to use it to solve my problem, using C #. I need to capture something that was between two matches of this regex and replace to a comma.

Could anyone help me? Thank you.

Upvotes: 2

Views: 181

Answers (5)

Squivo
Squivo

Reputation: 23

Thanks for the replies.

The string must have only commas as the delimiter.

The example I mentioned was just to illustrate, because this list was generated using a jquery plugin that had a flaw that was noticed only after allowing it to be saved in the list something like "[email protected]@email.com" or any other combination non standard "[email protected],[email protected]".

My main concern is cases like "[email protected]/[email protected]"

I'm trying to automate a search for this kind of inconsistency, as prevention.

I thought about using regex but I really do not know if it is the best approach. I am now thinking, as it is not a critical part of the system, it would be a simpler way just to use a list of invalid characters to make the replace.

But I will try the vks's solution.

Thank you all.

Upvotes: 0

Rotem
Rotem

Reputation: 21917

Your problem is unsolvable because the delimiter can not always be determined by a human.

Consider this input where the delimiter is a .:

[email protected]@otherServer.com

Is this:

[email protected] | [email protected]

or is it:

[email protected] | [email protected]

Or this input:

[email protected]@otherServer.com

Is it delimiter u:

[email protected] | [email protected]

Or delimiter t:

[email protected] | [email protected]

If you're not willing to accept a certain percentage of failures, you're better off looking for ways not to receive this input to begin with.

Upvotes: 7

vks
vks

Reputation: 67968

([^@,]+@[^.]+\.\w{3}(?!,|$)).

Try this.Replace by $1,.See demo.

http://regex101.com/r/tF4jD3/15

P.S this will work for email id's of format [email protected].

Upvotes: 0

OrangeKing89
OrangeKing89

Reputation: 704

One option you could try is to split the incoming string using the @ symbol and check that each part of the resulting array has a comma in int--except the first and last.

If you find one that is missing the comma do a search for the .com or .net or .org in that element and stick a comma after that character.

Lastly just run splice the list back together with the @ symbol

Upvotes: 0

ccalboni
ccalboni

Reputation: 12490

I can't think of an elegant way to achieve this. If you don't mind an inelegant solution, you can replace any top level domain plus one character with the same TLD plus comma.

You'll end up replacing ".com#" with ".com,", ".eu*" with ".eu," and so on. Replacement could take place using Regex so your iterations will be the same number of the TLDs you want to replace.

Upvotes: 0

Related Questions