user2375263
user2375263

Reputation: 195

PHP character validation

I have a HTML form, which is used to send e-mails directly form my page to my E-mail. The code for sending email and form is posted below.

I want my form to also accept charachters Č, č, Ž, ž, Š, š, Ć, ć... I know where the allowed charachters are written in the code, but since i have very little experience with PHP, I dont know how to add other charachters to existing ones.

Also, it seem to me, that the code only checks fields "name", "email" nad "url", and not "comments" and "subject". Am i coorect?

<form action="<?php echo basename(__FILE__); ?>" method="post" id="signup">
                <noscript>
                        <p><input type="hidden" name="nojs" id="nojs" /></p>
                </noscript>


                <div class="headerObrazca"> 
                <br/>               
                    <h3>Povpraševanje ali naročanje</h3>         
                    <p>Vsa polja so obvezna.</p>                    
                </div>
                <div class="sep"></div>

                <div class="inputs">

                    <center>
                        <input type="text" name="name"  placeholder="Ime in Priimek" autofocus value="<?php get_data("name"); ?>" /><br />


                        <input type="text" name="email" id="email"  placeholder="E-pošta" value="<?php get_data("email"); ?>" /><br />


                        <input type="text" name="subject" id="subject"  placeholder="Zadeva" value="<?php get_data("subject"); ?>" /><br />
                    </center>   


                        <textarea name="comments" id="comments" rows="5" cols="70" placeholder="Vaše vprašanje ali naročilo" ><?php get_data("comments"); ?></textarea><br />

                    <p>
                        <input type="submit" name="submit" id="submit" value="Pošlji!" <?php if (isset($disable) && $disable === true) echo ' disabled="disabled"'; ?> />
                    </p>
                </div>
    </form>

'

$yourEmail = "[email protected]"; // the email address you wish to receive these mails through
$yourWebsite = "XXX"; // the name of your website
$thanksPage = 'ponudbaHvala.php'; // URL to 'thanks for sending mail' page; leave empty to keep message on the same page 
$maxPoints = 4; // max points a person can hit before it refuses to submit - recommend 4
$requiredFields = "name,email,comments,subject"; // names of the fields you'd like to be required as a minimum, separate each field with a comma


// DO NOT EDIT BELOW HERE
$error_msg = array();
$result = null;

$requiredFields = explode(",", $requiredFields);

function clean($data) {
    $data = trim(stripslashes(strip_tags($data)));
    return $data;
}
function isBot() {
    $bots = array("Indy", "Blaiz", "Java", "libwww-perl", "Python", "OutfoxBot", "User-Agent", "PycURL", "AlphaServer", "T8Abot", "Syntryx", "WinHttp", "WebBandit", "nicebot", "Teoma", "alexa", "froogle", "inktomi", "looksmart", "URL_Spider_SQL", "Firefly", "NationalDirectory", "Ask Jeeves", "TECNOSEEK", "InfoSeek", "WebFindBot", "girafabot", "crawler", "www.galaxy.com", "Googlebot", "Scooter", "Slurp", "appie", "FAST", "WebBug", "Spade", "ZyBorg", "rabaz");

    foreach ($bots as $bot)
        if (stripos($_SERVER['HTTP_USER_AGENT'], $bot) !== false)
            return true;

    if (empty($_SERVER['HTTP_USER_AGENT']) || $_SERVER['HTTP_USER_AGENT'] == " ")
        return true;

    return false;
}

if ($_SERVER['REQUEST_METHOD'] == "POST") {
    if (isBot() !== false)
        $error_msg[] = "No bots please! UA reported as: ".$_SERVER['HTTP_USER_AGENT'];

    // lets check a few things - not enough to trigger an error on their own, but worth assigning a spam score.. 
    // score quickly adds up therefore allowing genuine users with 'accidental' score through but cutting out real spam :)
    $points = (int)0;

    $badwords = array("adult", "beastial", "bestial", "blowjob", "clit", "cum", "cunilingus", "cunillingus", "cunnilingus", "cunt", "ejaculate", "fag", "felatio", "fellatio", "fuck", "fuk", "fuks", "gangbang", "gangbanged", "gangbangs", "hotsex", "hardcode", "jism", "jiz", "orgasim", "orgasims", "orgasm", "orgasms", "phonesex", "phuk", "phuq", "pussies", "pussy", "spunk", "xxx", "viagra", "phentermine", "tramadol", "adipex", "advai", "alprazolam", "ambien", "ambian", "amoxicillin", "antivert", "blackjack", "backgammon", "texas", "holdem", "poker", "carisoprodol", "ciara", "ciprofloxacin", "debt", "dating", "porn", "link=", "voyeur", "content-type", "bcc:", "cc:", "document.cookie", "onclick", "onload", "javascript");

    foreach ($badwords as $word)
        if (
            strpos(strtolower($_POST['comments']), $word) !== false || 
            strpos(strtolower($_POST['name']), $word) !== false
        )
            $points += 2;

    if (strpos($_POST['comments'], "http://") !== false || strpos($_POST['comments'], "www.") !== false)
        $points += 2;
    if (isset($_POST['nojs']))
        $points += 1;
    if (preg_match("/(<.*>)/i", $_POST['comments']))
        $points += 2;
    if (strlen($_POST['name']) < 3)
        $points += 1;
    if (strlen($_POST['comments']) < 15 || strlen($_POST['comments'] > 1500))
        $points += 2;
    if (preg_match("/[bcdfghjklmnpqrstvwxyz]{7,}/i", $_POST['comments']))
        $points += 1;
    // end score assignments

    foreach($requiredFields as $field) {
        trim($_POST[$field]);

        if (!isset($_POST[$field]) || empty($_POST[$field]) && array_pop($error_msg) != "Prosim, izpolnite vsa polja in ponovno pošljite.\r\n")
            $error_msg[] = "Prosim, izpolnite vsa polja in ponovno pošljite.";
    }

    if (!empty($_POST['name']) && !preg_match("/^[a-zA-Z-'\s]*$/", stripslashes($_POST['name'])))
        $error_msg[] = "Obrazec ne sprejema posebnih znakov.\r\n";
    if (!empty($_POST['email']) && !preg_match('/^([a-z0-9])(([-a-z0-9._])*([a-z0-9]))*\@([a-z0-9])(([a-z0-9-])*([a-z0-9]))+' . '(\.([a-z0-9])([-a-z0-9_-])?([a-z0-9])+)+$/i', strtolower($_POST['email'])))
        $error_msg[] = "Vpisali ste napačno obliko E-pošte.\r\n";
    if (!empty($_POST['url']) && !preg_match('/^(http|https):\/\/(([A-Z0-9][A-Z0-9_-]*)(\.[A-Z0-9][A-Z0-9_-]*)+)(:(\d+))?\/?/i', $_POST['url']))
        $error_msg[] = "Invalid website url.\r\n";

    if ($error_msg == NULL && $points <= $maxPoints) {
        $subject = stripslashes(strip_tags( $_POST['subject'] ));

        $message = "Nekdo je izpolnil obrazec v povpraševanju: \n\n";
        foreach ($_POST as $key => $val) {
            if (is_array($val)) {
                foreach ($val as $subval) {
                    $message .= ucwords($key) . ": " . clean($subval) . "\r\n";
                }
            } else {
                $message .= ucwords($key) . ": " . clean($val) . "\r\n";
            }
        }
        $message .= "\r\n\n\n";
        $message .= 'IP: '.$_SERVER['REMOTE_ADDR']."\r\n";
        $message .= 'Browser: '.$_SERVER['HTTP_USER_AGENT']."\r\n";
        $message .= 'Points: '.$points;

        if (strstr($_SERVER['SERVER_SOFTWARE'], "Win")) {
            $headers   = "From: $yourEmail\n";
            $headers  .= "Reply-To: {$_POST['email']}";
        } else {
            $headers   = "From: $yourWebsite <$yourEmail>\n";
            $headers  .= "Reply-To: {$_POST['email']}";
        }

        if (mail($yourEmail,$subject,$message,$headers)) {
            if (!empty($thanksPage)) {
                header("Location: $thanksPage");
                exit;
            } else {
                $result = 'Your mail was successfully sent.';
                $disable = true;
            }
        } else {
            $error_msg[] = 'Vaše sporočilo trenutno ne mora biti poslano. ['.$points.']';
        }
    } else {
        if (empty($error_msg))
            $error_msg[] = 'Vaše sporočilo izgleda kot vsiljena pošta. Poskusite ponovno. ['.$points.']';
    }
}
function get_data($var) {
    if (isset($_POST[$var]))
        echo htmlspecialchars($_POST[$var]);
}
?>'

Upvotes: 0

Views: 2177

Answers (1)

ThW
ThW

Reputation: 19502

First make sure that you're using utf-8 as the encoding for your web page.

A Regular Expression (PCRE) is used to validate the input. For the name this is:

!preg_match("/^[a-zA-Z-'\s]*$/", ...)

This expression is not in utf-8 mode. It matches only bytes. In utf-8 characters like Č are multiple bytes. The modifier u activates the utf-8 mode. Additionally the - should be the first or last element in a character class. Between two other characters it defines a range (like a-z).

!preg_match("/^[a-zA-Z'\\s-]*$/u", ...)

In this mode you can add the special characters to the character class. You will have to make sure that your editor/ide stores the PHP as utf-8.

!preg_match("/^[a-zA-Z'\\sČ-]*$/u", ...)

Adding several character to the class will increase the size of the pattern really fast and forgetting a character is easy. A better solution are the unicode properties. "\pL" is short for all letters (including Cyrillic, Hangul, ...). This is the validation I suggest.

!preg_match("/^[\\pL'\\s-]*$/u", ...)

But you can limit this to more specific groups like "Latin".

!preg_match("/^[\\p{Latin}'\\s-]*$/u", ...)

Example:

// latin letters, valid: int(1)
var_dump(
  preg_match('(^\p{Latin}+$)u', 'aäČ')
);
// latin letters, invalid: int(0)
var_dump(
  preg_match('(^\p{Latin}+$)u', 'Русский')
);
// all letters, valid: int(1)
var_dump(
  preg_match('(^\pL+$)u', 'Русский')
);

Upvotes: 1

Related Questions