Reputation: 480
I am looking for a regular expression that would also identify and separate commas, the equal sign and any other special characters that I might need in the input.
Right now what I have is $content = preg_split('/[\s]+/', $file_content, -1, PREG_SPLIT_NO_EMPTY);
Which stores the content of the input file into an array where each element is separated by blank spaces.
However for example for function a (int i) {};
the array would look like this:
[0] = function
[1] = a
[2] = (int
[3] = i)
[4] = {};
And what I'd like to achieve with the regular expression is this:
[0] = function
[1] = a
[2] = (
[3] = int
[4] = i
[5] = )
[6] = {
[7] = }
[8] = ;
Upvotes: 1
Views: 1022
Reputation: 47874
I'll recommend matching a single non-letter or one-or-more letters, then restarting the fullstring match, then actually splitting on zero-or-more whitespaces. (Demo)
var_export(
preg_split(
'/(?:\PL|\pL*)\K\s*/u',
$input,
-1,
PREG_SPLIT_NO_EMPTY
)
);
Compare:
(?:\PL|\pL*)\K\s*
58 steps (with PREG_SPLIT_NO_EMPTY)
(?:\pL+|\S)\K\s*
59 steps (with PREG_SPLIT_NO_EMPTY)
([\p{P}\p{S}])|\s
75 steps (with PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE)
(?>\PL|\pL*)\K\s*(?!$)
85 steps (no flags needed)
Upvotes: 0
Reputation: 92854
Use preg_split
function with PREG_SPLIT_DELIM_CAPTURE
flag:
PREG_SPLIT_DELIM_CAPTURE
If this flag is set, parenthesized expression in the delimiter pattern will be captured and returned as well.
$input = 'function a (int i) {};';
$content = preg_split('/([\p{P}\p{S}])|\s/', $input,
-1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
print_r($content);
The output:
Array
(
[0] => function
[1] => a
[2] => (
[3] => int
[4] => i
[5] => )
[6] => {
[7] => }
[8] => ;
)
Upvotes: 3
Reputation: 16069
Instead of using the split()
function for this, you can use the following pattern in combination with preg_match_all()
:
[a-zA-Z]+|[^a-zA-Z\s]
It actually looks for multiple characters of [a-zA-Z]
(1 or more) or a single character which is not [a-zA-Z]
and not a whitespace character.
Here is an example:
<?php
$string = "function a (int i) {};";
$regex = "/[a-zA-Z]+|[^a-zA-Z\s]/";
$matches = array();
preg_match_all($regex, $string, $matches);
print_r($matches);
?>
This example can be run here.
Upvotes: 5