Reputation: 27725
I'm doing some regexp on some strings and in my pattern I match for whitespaces \s
But in some strings I experience some strange spaces.. When converted to hex a0
How to convert all strange spaces to a normal space so it can be detected with regexp and both and
\s
?
When the string is presented as UTF8 all a0
chars are represented as a �
a03535a03832a03834a03135a02da053452e6e723aa0444ba03132a03638a03336a03933
55 82 84 15 - SE.nr: DK 12 68 36 93
Upvotes: 3
Views: 75
Reputation: 627082
You do not need to add the non-breaking space to the [\s]
character class, \s
can match any Unicode whitespace if you use a /u
modifier:
'/\s/u'
See the regex demo
From pcre.org:
The default "space" characters are HT (9), LF (10), VT (11), FF (12), CR (13), and space (32)... If PCRE is compiled with Unicode property support, and the
PCRE_UCP
option is set, the behaviour is changed so that Unicode properties are used to determine character types:\s
any character that matches\p{Z}
or\h
or\v
The PCRE_UCP
verb and Unicode semantics are enabled with the /u
modifier.
Upvotes: 3
Reputation: 152266
a0
is a representation of
- non-breaking space.
You can match it with:
[\s\xA0]
Upvotes: 4