TurqMage
TurqMage

Reputation: 3321

Accepting only UTF8 letters with preg_match

I am trying to allow Chinese, Japanese (Hiragana,Katakana, Kanji), Korean, and basically any unicode letter. I would just like the first character to be a letter

$pattern = '/\p{L}[\p{L}\p{N} _.-]+/u';
if(!preg_match($pattern, $subuser)){
    //Error
}

However my pattern seems to accept strings with numbers at the front. When I added:

'/^\p{L}[\p{L}\p{N} _.-]+$/u'

No strings were accepted. I have tried using \p{Hiragana} etc but with no real luck. Does someone see what I am doing wrong?

Upvotes: 4

Views: 1642

Answers (2)

Matthew Sprankle
Matthew Sprankle

Reputation: 1632

The holy grail when it comes to sanitation: http://htmlpurifier.org/ It cleanses all data and will only allow utf-8 characters to pass. Some recommended reading on characters: http://htmlpurifier.org/docs/enduser-utf8.html

Upvotes: 1

Wil Moore III
Wil Moore III

Reputation: 7204

This should do the trick:

<?php

$lines = array('12345', 'w123', 'hello');

$valid = array_filter($lines, function($line){
  return preg_match('/^\p{L}{1,}/iu', $line);
});

var_dump($valid);

Upvotes: 2

Related Questions