Reputation: 3488
I have a string that I need to explode and get the information.
Sample string:
"20' Container 1, 40' Open Container 1, 40-45' Closed Container 3, container roll 10, container lift 50"
First I am exploding the string by ,
and getting
"20' Container 1"
"40' Open Container 1"
"40-45' Closed Container 3"
Now I want to explode the already exploded array as well so that I get the result in below format
array[
0 => [
0 => "20'"
1 => "Container"
2 => "1"
]
1 => [
0 => "40'"
1 => "Open Container"
2 => "1"
]
2 => [
0=> container roll
1=> 10
]
3=> [
0=> container lift
1 => 50
]
]
The strings may vary but it is decided that the format will be same e.g. length type number
where length
is optional,
I am doing
$pattern = '/([\d-]*\')\s(.*)\s(\d+)/';
foreach (explode(', ', $equipment->chassis_types) as $value) {
preg_match($pattern, $value, $matches); // Match length, type, number
$result[] = array_slice($matches, 1); // Slice with offset 1
$equipment->tokenized = $result;
}
and I get
Array
(
[0] => Array
(
[0] => 20'
[1] => container
[2] => 10
)
[1] => Array
(
[0] => 40'
[1] => open container
[2] => 10
)
[2] => Array
(
[0] => 40-45'
[1] => closed container
[2] => 20
)
[3] => Array
(
)
[4] => Array
(
)
)
Upvotes: 1
Views: 125
Reputation: 47874
I think I most agree with Erwin's answer, but despite this not being a validation task, I like that Jan's answer does a better job of defining the ranged "length" substring and Erwin's answer will match ' 1
. There is no indication that tabs or newlines exist in the input string, so a literal space is appropriate. Wrapping the regex pattern in double quotes means that the apostrophe in the pattern doesn't need to be escaped. For the record, Andreas's pattern is incorrect because it fails to match the "length" substring correctly AND it includes the unwanted whitespace in the "type" substring.
Here is what I would use to parse the provided input: (Demo) (Pattern Demo)
$string = "20' Container 1, 40' Open Container 1, 40-45' Closed Container 3, container roll 10, container lift 50";
preg_match_all(
"~(\d+(?:-\d+)?')? (\D+) (\d+)~",
$string,
$matches,
PREG_SET_ORDER
);
print_r($matches); // use var_export() to show that no spaces are captured
Pattern Explanation:
Length
)Type
)Number
)Upvotes: 0
Reputation: 23958
You can use *
to make the first number and ' optional.
$str = '20\' Container 1, 40\' Open Container 1, 40-45\' Closed Container 3, container roll 10, container lift 50';
preg_match_all('/(\d*\'*)\s([a-zA-Z ]+)(\d+)/', $str, $matches);
var_dump($matches);
This gives an output like this:
array(4) {
[0]=>
array(5) {
[0]=>
string(15) "20' Container 1"
[1]=>
string(20) "40' Open Container 1"
[2]=>
string(22) "45' Closed Container 3"
[3]=>
string(18) " container roll 10"
[4]=>
string(18) " container lift 50"
}
[1]=>
array(5) {
[0]=>
string(3) "20'"
[1]=>
string(3) "40'"
[2]=>
string(3) "45'"
[3]=>
string(0) ""
[4]=>
string(0) ""
}
[2]=>
array(5) {
[0]=>
string(10) "Container "
[1]=>
string(15) "Open Container "
[2]=>
string(17) "Closed Container "
[3]=>
string(15) "container roll "
[4]=>
string(15) "container lift "
}
[3]=>
array(5) {
[0]=>
string(1) "1"
[1]=>
string(1) "1"
[2]=>
string(1) "3"
[3]=>
string(2) "10"
[4]=>
string(2) "50"
}
}
To get an array that is closer to what you want you can use array column to group the matches as you like them to be.
$str = '20\' Container 1, 40\' Open Container 1, 40-45\' Closed Container 3, container roll 10, container lift 50';
preg_match_all('/(\d*\'*)\s([a-zA-Z ]+)(\d+)/', $str, $matches);
unset($matches[0]); // remove full match as it's not needed.
$res =[];
foreach($matches[1] as $key => $val){
$res[] = array_column($matches, $key);
}
var_dump($res);
Upvotes: -1
Reputation: 1981
Assuming that only length
can be missing, you can try with this pattern which I revised from your existing pattern. Plus the array_filter()
function to remove empty elements from each $matches
$pattern = '/([\d-]*\')?\s?(\D+)\s(\d+)/';
foreach (explode(', ', $equipment->chassis_types) as $value) {
preg_match($pattern, $value, $matches);
$result[] = array_slice(array_filter($matches), 1);
}
$equipment->tokenized = $result;
Revision to your the pattern:
?
after the first capture group so it can be skipped if not present/s?
right after to also skip first space if first group is not present(.*)
to (\D+)
to match any characters that's not a digit (Assuming type
never contains number)Note: I moved the line $equipment->tokenized = $result;
outside the loop to set it only once instead of repeatedly setting it inside the loop
Upvotes: 0
Reputation: 43169
With the given examples you could go for
<?php
$string = "20' Container 1, 40' Open Container 1, 40-45' Closed Container 3, container roll 10, container lift 50";
$regex = "~
(?:(?P<group1>\d+(?:-\d+)?')\h*)?
(?P<group2>(?i:[a-z]+\h?)+)\h+
(?P<group3>\d+(?:'')?)
~x";
if (preg_match_all($regex, $string, $matches, PREG_SET_ORDER)) {
print_r($matches);
}
?>
Array
(
[0] => Array
(
[0] => 20' Container 1
[group1] => 20'
[1] => 20'
[group2] => Container
[2] => Container
[group3] => 1
[3] => 1
)
[1] => Array
(
[0] => 40' Open Container 1
[group1] => 40'
[1] => 40'
[group2] => Open Container
[2] => Open Container
[group3] => 1
[3] => 1
)
[2] => Array
(
[0] => 40-45' Closed Container 3
[group1] => 40-45'
[1] => 40-45'
[group2] => Closed Container
[2] => Closed Container
[group3] => 3
[3] => 3
)
[3] => Array
(
[0] => container roll 10
[group1] =>
[1] =>
[group2] => container roll
[2] => container roll
[group3] => 10
[3] => 10
)
[4] => Array
(
[0] => container lift 50
[group1] =>
[1] =>
[group2] => container lift
[2] => container lift
[group3] => 50
[3] => 50
)
)
(?: # non-capturing group
(?P<group1>\d+(?:-\d+)?')\h* # group1 = digits, 1+ (-other digits), optionally
)? # make the whole group optional
(?P<group2>(?i:[a-z]+\h?)+)\h+ # group2 = [a-zA-Z]+ horizontal whitespaces, no digits
(?P<group3>\d+(?:'')?) # group3 = other digits + '', eventually
Upvotes: 2