Ricky
Ricky

Reputation: 275

Remove Hidden Non-ASCII Characters [PHP or MySQL]

I'm having a problem with hidden non-ASCII characters (spaces) in my database.

How can I replace them with normal spaces and convert them before being inserted to avoid future problems?

I'm still not 100% sure what's happening, but I think it's with the non-ASCII spaces. Any advice to help track it down will help.


Here's what's happening:

I have a database with keywords and if I search for "test keyword", nothing shows up. I know for a fact that "test keyword" is in the database.

If I search for "test" or "keyword", it will show up.

If I do a query with:

SELECT * FROM keywords WHERE keyword regexp '[^ -~]'; (found here)

It will display "test keyword" - giving me the conclusion there is a non-ASCII character with the space in "test keyword".

Upvotes: 4

Views: 14886

Answers (4)

bfilipesoares
bfilipesoares

Reputation: 147

I had the same issue and was able to create a update query to replace (in my case) non breaking spaces.

First I analyzed the binary values of the strings that had those chars (I used Mysql workbench 'Open value in editor" to do so). I realized that in my case the char(s) that I wanted to replace had a hex value of 'a0'.

Next I went to this page http://www.fileformat.info/info/unicode/char/a0/charset_support.htm and checked all the encodings that interpret a0 as a non breaking space.

Next I built this query

UPDATE keywords SET keyword = TRIM(REPLACE(keyword, CONVERT(char(160) USING hp8), ' '));

, I chose hp8 but utf8 worked as well.

It took me some time to reach this solution...so I hope this helps someone with the same problem, not to lose his mind trying to figure a solution.

Upvotes: 3

ErichBSchulz
ErichBSchulz

Reputation: 15669

How about:

update keywords
set keyword = replace(keyword, char(160), ' ') 
WHERE keyword LIKE concat('%',char(160),'%');

Upvotes: 1

Ricky
Ricky

Reputation: 275

This works with PHP:

str_replace("\xA0", ' ', $keyword)

Now i'm trying to replace all existing ones in the database.

I think this should be working, but it's not:

update keywords set keyword = replace(keyword, char(160), " ") WHERE keyword regexp char(160);

Any ideas?

Upvotes: 5

apscience
apscience

Reputation: 7253

Do you want to remove all non alphanumeric characters?

$string = “Here! is some text, and numbers 12345, and symbols !£$%^&”;

$new_string = preg_replace(“/[^a-zA-Z0-9\s]/”, “”, $string);

Upvotes: 0

Related Questions