Melanie Shebel
Melanie Shebel

Reputation: 2914

Remove all non-alphabetical, non-numerical characters from a string?

If I wanted to remove things like: .!,'"^-# from an array of strings, how would I go about this while retaining all alphabetical and numeric characters.

Allowed alphabetical characters should also include letters with diacritical marks including à or ç.

Upvotes: 6

Views: 9081

Answers (5)

kikuchiyo
kikuchiyo

Reputation: 3421

The following will work for an array:

z = ['asfdå', 'b12398!', 'c98347']
z.each { |s| s.gsub! /[^[:alnum:]]/, '' }
puts z.inspect

I borrowed Jeremy's suggested regex.

Upvotes: 3

Phrogz
Phrogz

Reputation: 303206

If you truly have an array (as you state) and it is an array of strings (I'm guessing), e.g.

foo = [ "hello", "42 cats!", "yöwza" ]

then I can imagine that you either want to update each string in the array with a new value, or that you want a modified array that only contains certain strings.

If the former (you want to 'clean' every string the array) you could do one of the following:

foo.each{ |s| s.gsub! /\p{^Alnum}/, '' }     # Change every string in place…
bar = foo.map{ |s| s.gsub /\p{^Alnum}/, '' } # …or make an array of new strings
#=> [ "hello", "42cats", "yöwza" ]

If the latter (you want to select a subset of the strings where each matches your criteria of holding only alphanumerics) you could use one of these:

# Select only those strings that contain ONLY alphanumerics
bar = foo.select{ |s| s =~ /\A\p{Alnum}+\z/ }
#=> [ "hello", "yöwza" ]

# Shorthand method for the same thing
bar = foo.grep /\A\p{Alnum}+\z/     
#=> [ "hello", "yöwza" ]

In Ruby, regular expressions of the form /\A………\z/ require the entire string to match, as \A anchors the regular expression to the start of the string and \z anchors to the end.

Upvotes: 1

Marc-André Lafortune
Marc-André Lafortune

Reputation: 79562

You should use a regex with the correct character property. In this case, you can invert the Alnum class (Alphabetic and numeric character):

"◊¡ Marc-André !◊".gsub(/\p{^Alnum}/, '') # => "MarcAndré"

For more complex cases, say you wanted also punctuation, you can also build a set of acceptable characters like:

"◊¡ Marc-André !◊".gsub(/[^\p{Alnum}\p{Punct}]/, '') # => "¡MarcAndré!"

For all character properties, you can refer to the doc.

Upvotes: 18

Student
Student

Reputation: 192

You might consider a regular expression.

http://www.regular-expressions.info/ruby.html

I'm assuming that you're using ruby since you tagged that in your post. You could go through the array, put it through a test using a regexp, and if it passes remove/keep it based on the regexp you use.

A regexp you might use might go something like this:

[^.!,^-#]

That will tell you if its not one of the characters inside the brackets. However, I suggest that you look up regular expressions, you might find a better solution once you know their syntax and usage.

Upvotes: 1

Jeremy Roman
Jeremy Roman

Reputation: 16345

string.gsub(/[^[:alnum:]]/, "")

Upvotes: 3

Related Questions