Reputation: 719
I have a string like
$str = "hyper text-markup_language";
$keywords = preg_split("/[_,-, ]+/", $str);
i used preg_split, but it split the string on the basis of underscore and dash not on the basis on space.
i want result like this
[0] = hyper
[1] = text
[2] = markup
[3] = language
Upvotes: 2
Views: 5829
Reputation: 626689
Your [_,-, ]+
pattern matches one or more symbols that are either ,
, space or a comma, it does not match a hyphen. See the demo here. The reason for it is that a [,-,]
creates a range between a comma and a comma, thus matching only a comma.
You may use [\s_-]+
as the regex pattern to match one or more (due to the +
quantifier) symbols from the set (either whitespace (matched with \s
), _
or -
(as at the end of the character class it is parsed as a literal -
symbol)).
$str = "hyper text-markup_language";
$res = preg_split('~[\s_-]+~', $str, 0, PREG_SPLIT_NO_EMPTY);
print_r($res);
// => Array ( [0] => hyper [1] => text [2] => markup 3] => language )
See the PHP demo.
You may read on character classes at regular-expressions.info.
Upvotes: 1
Reputation: 2561
@user3056158 you can also do it without preg_split() like below :
<?php
$str = "hyper text-markup_language";
$str = str_replace(array(" ", "-", "_"), " ", $str);
echo "<pre>";
print_r(explode(" ", $str));
?>
Upvotes: 2
Reputation: 3968
Nice and simple solution.
<?php
$str = "hyper text-markup_language";
$arr = preg_split("/[_,\- ]+/", $str);
var_dump($arr);
?>
This produces this output.
array (size=4)
0 => string 'hyper' (length=5)
1 => string 'text' (length=4)
2 => string 'markup' (length=6)
3 => string 'language' (length=8)
The issue was when you were writing the -
character, the RegEx was reading this as a range value from the comma to the comma (which obviously is just a comma).
Escaping the hyphen and removing the duplicate comma (the square brackets mean list of anything inside) will produce an array.
Square brackets are referred to as Character Sets.
They will match anything that is in them. See this example.
/gr[ae]y/
This will match gray
and grey
. This is because the square brackets are matching the a
or the e
. Changing the above to /gr[a-e]y/
would mean that gray
, grby
, grcy
, grdy
, and grey
would all match. This is because the hyphen (-
) is a special character that will create a list from what is before the the hyphen to what is after it.
An alternative (following @anubhava comment) is to put the hyphen at the beginning or end of the character set in order for it to not need escaping since there it cannot create a range if there is nothing in front or behind it.
Upvotes: 9