wanderingandy
wanderingandy

Reputation: 873

Removing non-alphanumeric characters with sed

I am trying to validate some inputs to remove a set of characters. Only alphanumeric characters plus, period, underscore, hyphen are allowed. I've tested the regex expression [^\w.-] here http://gskinner.com/RegExr/ and it matches what I want removed so I not sure why sed is returning the opposite. What am I missing?

My end goal is to input "Â10.41.89.50 " and get "10.41.89.50".

I've tried:

echo "Â10.41.89.50 " | sed s/[^\w.-]//g returns Â...

echo "Â10.41.89.50 " | sed s/[\w.-]//g and echo "Â10.41.89.50 " | sed s/[\w^.-]//g returns Â10418950

I attempted the answer found here Skip/remove non-ascii character with sed but nothing was removed.

Upvotes: 61

Views: 106807

Answers (6)

Iwan Plays
Iwan Plays

Reputation: 29

s/[^[:alnum:]+._-]//g

removes anything other than alphanumeric and ".+_-" characters.

echo "Â10.41.89.50 +-_" | sed s/[^[:alnum:]+._-]//g
Â10.41.89.50+-_

Upvotes: 2

panticz
panticz

Reputation: 2315

To remove all characters except of alphanumeric and "-" use this code:

echo "a b-1_2" | sed "s/[^[:alnum:]-]//g"

Upvotes: 15

iruvar
iruvar

Reputation: 23374

's -c (complement) flag may be an option

echo "Â10.41.89.50-._ " | tr -cd '[:alnum:]._-'

Upvotes: 89

technerdius
technerdius

Reputation: 353

<`[[:alnum:]_.@]`

This worked just fine for me. It preserved all of the characters I specified for my purposes.

Upvotes: 0

gniourf_gniourf
gniourf_gniourf

Reputation: 46813

You might want to use the [:alpha:] class instead:

echo "Â10.41.89.50 " | sed "s/[[:alpha:].-]//g"

should work. If not, you might need to change your local settings.

On the other hand, if you only want to keep the digits, the hyphens and the period::

echo "Â10.41.89.50 " | sed "s/[^[:digit:].-]//g"

If your string is in a variable, you can use pure bash and parameter expansions for that:

$ dirty="Â10.41.89.50 "
$ clean=${dirty//[^[:digit:].-]/}
$ echo "$clean"
10.41.89.50

or

$ dirty="Â10.41.89.50 "
$ clean=${dirty//[[:alpha:]]/}
$ echo "$clean"
10.41.89.50

You can also have a look at 1_CR's answer.

Upvotes: 28

anubhava
anubhava

Reputation: 784888

Well sed won't support unicode characters. Use perl instead:

> s="Â10.41.89.50 "
> perl -pe 's/[^\w.-]+//g' <<< "$s"
10.41.89.50

Upvotes: 7

Related Questions