user3056158
user3056158

Reputation: 719

how to split the string which containing underscore, dash and space

I have a string like

$str = "hyper text-markup_language";
$keywords = preg_split("/[_,-, ]+/", $str);

i used preg_split, but it split the string on the basis of underscore and dash not on the basis on space.

i want result like this

[0] = hyper
[1] = text
[2] = markup
[3] = language

Upvotes: 2

Views: 5829

Answers (4)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626689

Your [_,-, ]+ pattern matches one or more symbols that are either ,, space or a comma, it does not match a hyphen. See the demo here. The reason for it is that a [,-,] creates a range between a comma and a comma, thus matching only a comma.

You may use [\s_-]+ as the regex pattern to match one or more (due to the + quantifier) symbols from the set (either whitespace (matched with \s), _ or - (as at the end of the character class it is parsed as a literal - symbol)).

$str = "hyper text-markup_language";
$res = preg_split('~[\s_-]+~', $str, 0, PREG_SPLIT_NO_EMPTY);
print_r($res);
// => Array ( [0] => hyper [1] => text [2] => markup 3] => language )

See the PHP demo.

You may read on character classes at regular-expressions.info.

Upvotes: 1

lazyCoder
lazyCoder

Reputation: 2561

@user3056158 you can also do it without preg_split() like below :

<?php
  $str = "hyper text-markup_language";
  $str = str_replace(array(" ", "-", "_"), " ", $str);
  echo "<pre>";
  print_r(explode(" ", $str));
?>

Upvotes: 2

BritishWerewolf
BritishWerewolf

Reputation: 3968

Nice and simple solution.

<?php
$str = "hyper text-markup_language";
$arr = preg_split("/[_,\- ]+/", $str);
var_dump($arr);
?>

This produces this output.

array (size=4)
  0 => string 'hyper' (length=5)
  1 => string 'text' (length=4)
  2 => string 'markup' (length=6)
  3 => string 'language' (length=8)

The issue was when you were writing the - character, the RegEx was reading this as a range value from the comma to the comma (which obviously is just a comma).

Escaping the hyphen and removing the duplicate comma (the square brackets mean list of anything inside) will produce an array.

RegEx explained

Square brackets are referred to as Character Sets.
They will match anything that is in them. See this example.

/gr[ae]y/

This will match gray and grey. This is because the square brackets are matching the a or the e. Changing the above to /gr[a-e]y/ would mean that gray, grby, grcy, grdy, and grey would all match. This is because the hyphen (-) is a special character that will create a list from what is before the the hyphen to what is after it.

An alternative (following @anubhava comment) is to put the hyphen at the beginning or end of the character set in order for it to not need escaping since there it cannot create a range if there is nothing in front or behind it.

Upvotes: 9

Mr.lin
Mr.lin

Reputation: 99

you should write it like this .

[-_ ]+

Upvotes: 1

Related Questions