Sunil Kumar Sahoo
Sunil Kumar Sahoo

Reputation: 53647

How to encode a string to replace all special characters

I have a string which contains special character. But I have to convert the string into a string without having any special character so I used Base64 But in Base64 we are using equals to symbol (=) which is a special character. But I want to convert the string into a string which will have only alphanumerical letters. Also I can't remove special character only i have to replace all the special characters to maintain unique between two different strings. How to achieve this, Which encoding will help me to achieve this?

Upvotes: 4

Views: 8662

Answers (5)

Kannan Suresh
Kannan Suresh

Reputation: 4580

The easiest way would be to use a regular expression to match all nonalphanumeric characters and replace them with an empty string.

// This will remove all special characters except space.
var cleaned = stringToReplace.replace(/[^\w\s]/gm, '')

Adding any special characters to the above regex will skip that character.

// This will remove all special characters except space and period.
var cleaned = stringToReplace.replace(/[^\w\s.]/gm, '')

A working example.

const regex = /[^\w\s]/gm;
const str = `This is a text with many special characters.
Hello, user, your password is 543#!\$32=!`;
const subst = ``;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);

console.log('Substitution result: ', result);

Regex explained.

[^\w\s]/gm
Match a single character not present in the list below [^\w\s]
\w matches any word character (equivalent to [a-zA-Z0-9_])
\s matches any whitespace character (equivalent to [\r\n\t\f\v \u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff])

Global pattern flags
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)

Upvotes: 2

Jilles van Gurp
Jilles van Gurp

Reputation: 8284

Commons codec has a url safe version of base64, which emits - and _ instead of + and / characters

http://commons.apache.org/codec/apidocs/org/apache/commons/codec/binary/Base64.html#encodeBase64URLSafe(byte[])

Upvotes: 2

Stephen C
Stephen C

Reputation: 718728

There are a number of variations of base64, some of which don't use padding. (You still have a couple of non-alphanumeric characters for characters 62 and 63.)

The Wikipedia page on base64 goes into the details, including the "standard" variations used for a number of common use-cases. (Does yours match one of those?)

If your strings have to be strictly alphanumeric, then you'll need to use hex encoding (one byte becomes 2 hex digits), or roll your own encoding scheme. Your stated requirements are rather unusual ...

Upvotes: 2

Jon Skeet
Jon Skeet

Reputation: 1499900

The simplest option would be to encode the text to binary using UTF-8, and then convert the binary back to text as hex (two characters per byte). It won't be terribly efficient, but it will just be alphanumeric.

You could use base32 instead to be a bit more efficient, but that's likely to be significantly more work, unless you can find a library which supports it out of the box. (Libraries to perform hex encoding are very common.)

Upvotes: 3

Dean Povey
Dean Povey

Reputation: 9446

If you truly can only use alphanumerical characters you will have to come up with an escaping scheme that uses one of those chars for example, use 0 as the escape, and then encode the special char as a 2 char hex encoding of the ascii. Use 000 to mean 0.

e.g.

This is my special sentence with a 0.

encodes to:

This020is020my020special020sentence020with020a02000002e

Upvotes: 0

Related Questions