NikoKaradzhov
NikoKaradzhov

Reputation: 1

Regex Remove unnecessary symbols with certain rules c#

I want to remove all unnecessary characters so the name can be valid, here are the rules :

• Has length between 3 and 16 characters
• Contains only letters, numbers, hyphens and underscores
• Has no redundant symbols before, after or in between

This is the input:

Jeff, john45, ab, cd, peter-ivanov, @smith, sh, too_long_username, !lleg@l ch@rs, jeffbutt

My Regex so far is : https://regexr.com/4ahls, and I want to remove:
@smith
!lleg@l
ch@rs

Upvotes: 0

Views: 314

Answers (3)

Aleks Andreev
Aleks Andreev

Reputation: 7054

You actually do not need a regex to solve this. Use old good string.Split() and process names

var input = "Jeff, john45, ab, cd, peter-ivanov, @smith, sh, too_long_username, !lleg@l ch@rs, jeffbutt";
var listOfNames = input.Split(new[] {",", " "}, StringSplitOptions.RemoveEmptyEntries)
    .Where(l => l.Length >= 3 && l.Length <= 18) // filter by length
    .Where(l => l.All(c => char.IsDigit(c) || char.IsLetter(c) || c == '-')) // filter by spec chars
    .ToList();

now you have a list of four names. If you want to turn it back to string just join your names:

var singleLine = string.Join(", ", listOfNames);
// singleLine is "Jeff, john45, peter-ivanov, jeffbutt"

Upvotes: 0

Pushpesh Kumar Rajwanshi
Pushpesh Kumar Rajwanshi

Reputation: 18357

Your own regex \b([a-zA-Z0-9_-]){3,16}\b is good enough for giving the intended match but \b fails to do their job and will allow matching partially in a word like @smith to give you smith because @ is not part of word character and hence s in smith will match as the point in between @ and s is indeed a word boundary. You will need a different regex ensuring the word is preceded/followed by a space and in addition comma too as some words are followed by comma and you want to count them in. Try using this regex,

(?<= |^)[a-zA-Z0-9_-]{3,16}(?=[ ,]|$)

Demo

This should give you matches to only words that follow your rules.

Note: Always keep - either at very start or very end while having it in a character set, otherwise it sometimes behaves weird and gives unexpected results.

Upvotes: 1

Michał Turczyn
Michał Turczyn

Reputation: 37367

You could try this pattern: (?=^[a-zA-Z0-9-_]{3,16}$).+

Generally positive lookaheads (?=...) are used to assert that some rules are valid, as you want to do. Explanation:

^ - match beginning of a string

[a-zA-Z0-9-_]{3,16} - match at least 3 and 16 at most of characters in a character class: a-zA-Z - all letters, 0-9 - digits, -_ - hyphen or underscore

$ - end of a string

And if this assertion is successfull, then match everything with .*

Demo

Upvotes: 0

Related Questions