Reham Fahmy
Reham Fahmy

Reputation: 5073

Strip out HTML and Special Characters

I'd like to use any php function or whatever so that i can remove any HTML code and special characters and gives me only alpha-numeric output

$des = "Hello world)<b> (*&^%$#@! it's me: and; love you.<p>";

I want the output become Hello world it s me and love you (just Aa-Zz-0-9-WhiteSpace)

I've tried strip_tags but it removes only HTML codes

$clear = strip_tags($des); 
echo $clear;

So is there any way to do it?

Upvotes: 51

Views: 121172

Answers (9)

Siddharth Shukla
Siddharth Shukla

Reputation: 1131

Remove all special character don't give space write in single line

trim(preg_replace('/ +/', ' ', preg_replace('/[^A-Za-z0-9 ]/', ' ', 
urldecode(html_entity_decode(strip_tags($string))))));

Upvotes: 0

suika
suika

Reputation: 1

preg_replace('/[^a-zA-Z0-9\s]/', '',$string) this is using for removing special character only rather than space between the strings.

Upvotes: 0

All the other solutions are creepy because they are from someone that arrogantly simply thinks that English is the only language in the world :)

All those solutions strip also diacritics like ç or à.

The perfect solution, as stated in PHP documentation, is simply:

$clear = strip_tags($des);

Upvotes: 6

Tom
Tom

Reputation: 1

to allow periods and any other character just add them like so:

change: '#[^a-zA-Z ]#' to:'#[^a-zA-Z .()!]#'

Upvotes: 0

Viktor
Viktor

Reputation: 547

Here's a function I've been using that I've put together from various threads around the net that removes everything, all tags and leaves you with a perfect phrase. Does anyone know how to modify this script to allow periods (.) ? In other words, leave everything 'as is' but leave the periods alone or other punctuation like and ! or a comma? let me know.

function stripAlpha( $item )

{

    $search     = array( 
         '@<script[^>]*?>.*?</script>@si'   // Strip out javascript 
        ,'@<style[^>]*?>.*?</style>@siU'    // Strip style tags properly 
        ,'@<[\/\!]*?[^<>]*?>@si'            // Strip out HTML tags
        ,'@<![\s\S]*?–[ \t\n\r]*>@'         // Strip multi-line comments including CDATA
        ,'/\s{2,}/'
        ,'/(\s){2,}/'

    );

    $pattern    = array(

         '#[^a-zA-Z ]#'                     // Non alpha characters
        ,'/\s+/'                            // More than one whitespace

    );

    $replace    = array(
         ''
        ,' '

    );

    $item = preg_replace( $search, '', html_entity_decode( $item ) );
    $item = trim( preg_replace( $pattern, $replace, strip_tags( $item ) ) );
    return $item;

}

Upvotes: 1

nodws
nodws

Reputation: 1127

You can do it in one single line :) specially useful for GET or POST requests

$clear = preg_replace('/[^A-Za-z0-9\-]/', '', urldecode($_GET['id']));

Upvotes: 1

Aditya P Bhatt
Aditya P Bhatt

Reputation: 22081

In a more detailed manner from Above example, Considering below is your string:

$string = '<div>This..</div> <a>is<a/> <strong>hello</strong> <i>world</i> ! هذا هو مرحبا العالم! !@#$%^&&**(*)<>?:";p[]"/.,\|`~1@#$%^&^&*(()908978867564564534423412313`1`` "Arabic Text نص عربي test 123 و,.m,............ ~~~ ٍ،]ٍْ}~ِ]ٍ}"; ';

Code:

echo preg_replace('/[^A-Za-z0-9 !@#$%^&*().]/u','', strip_tags($string));

Allows: English letters (Capital and small), 0 to 9 and characters !@#$%^&*().

Removes: All html tags, and special characters other than above

Upvotes: 1

Mez
Mez

Reputation: 24951

Probably better here for a regex replace

// Strip HTML Tags
$clear = strip_tags($des);
// Clean up things like &amp;
$clear = html_entity_decode($clear);
// Strip out any url-encoded stuff
$clear = urldecode($clear);
// Replace non-AlNum characters with space
$clear = preg_replace('/[^A-Za-z0-9]/', ' ', $clear);
// Replace Multiple spaces with single space
$clear = preg_replace('/ +/', ' ', $clear);
// Trim the string of leading/trailing space
$clear = trim($clear);

Or, in one go

$clear = trim(preg_replace('/ +/', ' ', preg_replace('/[^A-Za-z0-9 ]/', ' ', urldecode(html_entity_decode(strip_tags($des))))));

Upvotes: 151

Matt Stein
Matt Stein

Reputation: 3053

Strip out tags, leave only alphanumeric characters and space:

$clear = preg_replace('/[^a-zA-Z0-9\s]/', '', strip_tags($des));

Edit: all credit to DaveRandom for the perfect solution...

$clear = preg_replace('/[^a-zA-Z0-9\s]/', '', strip_tags(html_entity_decode($des)));

Upvotes: 13

Related Questions