circler
circler

Reputation: 369

Trim Non-alphanum from beginning and end of string

what is the best way to trim ALL non alpha numeric characters from the beginning and end of a string ? I tried to add characters that I do no need manually but it doesn't work well and use the . I just need to trim anything not alphanumeric.

I tried using this function:

   string something = "()&*1@^#47*^#21%Littering aaaannnndóú(*&^1#*32%#**)7(#9&^";
   string somethingNew = Regex.Replace(something, @"[^\p{L}-\s]+", "");

But it removes all characters that are non alpha numeric from the string. What I basically want is like this:

"test1" -> test1
#!@!2test# -> 2test
(test3) -> test3
@@test4---- -> test4

I do want to support unicode characters but not symbols..

EDIT: The output of the example should be:

Littering aaaannnndóú

Regards

Upvotes: 4

Views: 1963

Answers (8)

Gene R
Gene R

Reputation: 3744

Oneliner non-regex:

testString = testString.Trim(testString.Where(p => !char.IsLetterOrDigit(p)).ToArray());

Upvotes: 0

Snowman
Snowman

Reputation: 1543

Without using regex: In Java, you could do: (in c# syntax would be nearly the same with same functionality)

while (true) {
    if (word.length() == 0) {
        return ""; // bad
    }

    if (!Character.isLetter(word.charAt(0))) {
        word = word.substring(1);
        continue; // so we are doing front first
    }
    if (!Character.isLetter(word.charAt(word.length()-1))) {
        word = word.substring(0, word.length()-1);
        continue; // then we are doing end
    }
    break; // if front is done, and end is done
}

Upvotes: 0

Ron Rosenfeld
Ron Rosenfeld

Reputation: 60224

And you could also replace all the non-letters/numbers at the beginning and/or end of the line:

^[^\p{L}\p{N}]*|[^\p{L}\p{N}]*$

used as

 resultString = Regex.Replace(subjectString, @"^[^\p{L}\p{N}]*|[^\p{L}\p{N}]*$", "", RegexOptions.Multiline);

If you really want to only remove characters at the beginning and end of the "String" and not do this line by line, then remove the ^$ match at linebreak option (RegexOption.Multiline)

If you wanted to include leading or trailing underscores, as characters to be retained, you could simplify the regex to:

^\W+|\W+$

The core of the regex:

[^\p{L}\p{N}]

is a negated character class which includes all of the characters in the Unicode class of Letters \p{L} or Numbers \p{N}

In other words:

Trim non-unicode alphanumeric characters

^[^\p{L}\p{N}]*|[^\p{L}\p{N}]*$

Options: Case sensitive; Exact spacing; Dot doesn't match line breaks; ^$ match at line breaks; Parentheses capture

Match this alternative «^[^\p{L}\p{N}]*»
   Assert position at the beginning of a line «^»
   Match any single character NOT present in the list below «[^\p{L}\p{N}]*»
      Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
      A character from the Unicode category “letter” «\p{L}»
      A character from the Unicode category “number” «\p{N}»
Or match this alternative «[^\p{L}\p{N}]*$»
   Match any single character NOT present in the list below «[^\p{L}\p{N}]*»
      Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
      A character from the Unicode category “letter” «\p{L}»
      A character from the Unicode category “number” «\p{N}»
   Assert position at the end of a line «$»

Created with RegexBuddy

Upvotes: 0

Douglas
Douglas

Reputation: 54887

Assuming you want to trim non-alphanumeric characters from the start and end of your string:

s = new string(s.SkipWhile(c => !char.IsLetterOrDigit(c))
                .TakeWhile(char.IsLetterOrDigit)
                .ToArray());

Upvotes: 3

alpha bravo
alpha bravo

Reputation: 7948

you could use this pattern

^[^[:alnum:]]+|[^[:alnum:]]+$  

with g option Demo

Upvotes: -1

walid toumi
walid toumi

Reputation: 2272

@"[^\p{L}\s-]+(test\d*)|(test\d*)[^\p{L}\s-]+","$1"

Upvotes: 1

Pierre-Luc Pineault
Pierre-Luc Pineault

Reputation: 9201

If you need to remove any character which is not alphanumeric, you can use IsLetterOrDigit paired with a Where to go through every character. And because we're working at the char level, we'll need a little Concat at the end to bring everything back into a string.

string result = string.Concat(input.Where(char.IsLetterOrDigit));

which you can easily convert into an extension method

public static class Extensions
{
    public static string ToAlphaNum(this string input)
    {
        return string.Concat(input.Where(char.IsLetterOrDigit));
    }
}

that you can use like this :

string testString = "#!@!\"(test123)\"";
string result = testString.ToAlphaNum(); //test123

Note: this will remove every non-alphanumeric character from your string, if you really need to remove only those at the beginning/end, please add more details about what defines a beginning or an end and add more examples.

Upvotes: 0

Sudhakar Tillapudi
Sudhakar Tillapudi

Reputation: 26209

You can use String function String.Trim Method (Char[]) in .NET library to trim the unnecessary characters from the given string.

From MSDN : String.Trim Method (Char[])

Removes all leading and trailing occurrences of a set of characters specified in an array from the current String object.

Before trimming the unwanted characters, you need to first identify whether the character is Letter Or Digit, if it is non-alphanumeric then you can use String.Trim Method (Char[]) function to remove it.

you need to use Char.IsLetterOrDigit() function to identify wether the character is alphanumeric or not.

From MSDN: Char.IsLetterOrDigit()

Indicates whether a Unicode character is categorized as a letter or a decimal digit.

Try This:

string str = "()&*1@^#47*^#21%Littering aaaannnndóú(*&^1#*32%#**)7(#9&^";
foreach (char ch in str)
{
    if (!char.IsLetterOrDigit(ch))
        str = str.Trim(ch);
}

Output:

1@^#47*^#21%Littering aaaannnndóú(*&^1#*32%#**)7(#9

Upvotes: 0

Related Questions