baig772
baig772

Reputation: 3488

Extracting the parts of string into array php

I have a string that I need to explode and get the information.

Sample string:

"20' Container 1, 40' Open Container 1, 40-45' Closed Container 3, container roll 10, container lift 50"

First I am exploding the string by , and getting

"20' Container 1"
"40' Open Container 1"
"40-45' Closed Container 3"

Now I want to explode the already exploded array as well so that I get the result in below format

array[
    0 => [
        0 => "20'"
        1 => "Container"
        2 => "1"
        ]
    1 => [
        0 => "40'"
        1 => "Open Container"
        2 => "1"
        ]
    2 => [
          0=> container roll
          1=> 10
         ]
    3=> [
         0=> container lift
         1 => 50
        ]
    ]

The strings may vary but it is decided that the format will be same e.g. length type number where length is optional,

I am doing

$pattern = '/([\d-]*\')\s(.*)\s(\d+)/';
            foreach (explode(', ', $equipment->chassis_types) as $value) {
                preg_match($pattern, $value, $matches); // Match length, type, number
                $result[] = array_slice($matches, 1);   // Slice with offset 1
                $equipment->tokenized   =   $result;
            }

and I get

Array
(
    [0] => Array
        (
            [0] => 20'
            [1] => container
            [2] => 10
        )

    [1] => Array
        (
            [0] => 40'
            [1] => open container
            [2] => 10
        )

    [2] => Array
        (
            [0] => 40-45'
            [1] => closed container
            [2] => 20
        )

    [3] => Array
        (
        )

    [4] => Array
        (
        )

)

Upvotes: 1

Views: 125

Answers (4)

mickmackusa
mickmackusa

Reputation: 47874

I think I most agree with Erwin's answer, but despite this not being a validation task, I like that Jan's answer does a better job of defining the ranged "length" substring and Erwin's answer will match ' 1. There is no indication that tabs or newlines exist in the input string, so a literal space is appropriate. Wrapping the regex pattern in double quotes means that the apostrophe in the pattern doesn't need to be escaped. For the record, Andreas's pattern is incorrect because it fails to match the "length" substring correctly AND it includes the unwanted whitespace in the "type" substring.

Here is what I would use to parse the provided input: (Demo) (Pattern Demo)

$string = "20' Container 1, 40' Open Container 1, 40-45' Closed Container 3, container roll 10, container lift 50";

preg_match_all(
    "~(\d+(?:-\d+)?')? (\D+) (\d+)~",
    $string,
    $matches,
    PREG_SET_ORDER
);

print_r($matches);  // use var_export() to show that no spaces are captured

Pattern Explanation:

  1. Match one or more digits, then optionally match a hyphen followed by one or more digits, then match an apostrophe. This whole captured sequence is optional. (Length)
  2. Match, but do not capture, one space.
  3. Capture one or more non-digit characters. (Type)
  4. Match, but do not capture, one space.
  5. Capture one or more digits. (Number)

Upvotes: 0

Andreas
Andreas

Reputation: 23958

You can use *to make the first number and ' optional.

$str = '20\' Container 1, 40\' Open Container 1, 40-45\' Closed Container 3, container roll 10, container lift 50';
preg_match_all('/(\d*\'*)\s([a-zA-Z ]+)(\d+)/', $str, $matches);
var_dump($matches);

This gives an output like this:

array(4) {
  [0]=>
  array(5) {
    [0]=>
    string(15) "20' Container 1"
    [1]=>
    string(20) "40' Open Container 1"
    [2]=>
    string(22) "45' Closed Container 3"
    [3]=>
    string(18) " container roll 10"
    [4]=>
    string(18) " container lift 50"
  }
  [1]=>
  array(5) {
    [0]=>
    string(3) "20'"
    [1]=>
    string(3) "40'"
    [2]=>
    string(3) "45'"
    [3]=>
    string(0) ""
    [4]=>
    string(0) ""
  }
  [2]=>
  array(5) {
    [0]=>
    string(10) "Container "
    [1]=>
    string(15) "Open Container "
    [2]=>
    string(17) "Closed Container "
    [3]=>
    string(15) "container roll "
    [4]=>
    string(15) "container lift "
  }
  [3]=>
  array(5) {
    [0]=>
    string(1) "1"
    [1]=>
    string(1) "1"
    [2]=>
    string(1) "3"
    [3]=>
    string(2) "10"
    [4]=>
    string(2) "50"
  }
}

To get an array that is closer to what you want you can use array column to group the matches as you like them to be.

$str = '20\' Container 1, 40\' Open Container 1, 40-45\' Closed Container 3, container roll 10, container lift 50';
preg_match_all('/(\d*\'*)\s([a-zA-Z ]+)(\d+)/', $str, $matches);
unset($matches[0]); // remove full match as it's not needed.

$res =[];
foreach($matches[1] as $key => $val){
    $res[] = array_column($matches, $key);
}
var_dump($res);

https://3v4l.org/4rGod

Upvotes: -1

Goma
Goma

Reputation: 1981

Assuming that only length can be missing, you can try with this pattern which I revised from your existing pattern. Plus the array_filter() function to remove empty elements from each $matches

$pattern = '/([\d-]*\')?\s?(\D+)\s(\d+)/';
foreach (explode(', ', $equipment->chassis_types) as $value) {
    preg_match($pattern, $value, $matches);
    $result[] = array_slice(array_filter($matches), 1);
}
$equipment->tokenized = $result;

Revision to your the pattern:

  • ? after the first capture group so it can be skipped if not present
  • /s? right after to also skip first space if first group is not present
  • Changed (.*) to (\D+) to match any characters that's not a digit (Assuming type never contains number)

Note: I moved the line $equipment->tokenized = $result; outside the loop to set it only once instead of repeatedly setting it inside the loop

Upvotes: 0

Jan
Jan

Reputation: 43169

With the given examples you could go for

<?php

$string = "20' Container 1, 40' Open Container 1, 40-45' Closed Container 3, container roll 10, container lift 50";

$regex = "~
        (?:(?P<group1>\d+(?:-\d+)?')\h*)?
        (?P<group2>(?i:[a-z]+\h?)+)\h+
        (?P<group3>\d+(?:'')?)
        ~x";

if (preg_match_all($regex, $string, $matches, PREG_SET_ORDER)) {
    print_r($matches);
}
?>

See a demo on regex101.com.


This yields:

Array
(
    [0] => Array
        (
            [0] => 20' Container 1
            [group1] => 20'
            [1] => 20'
            [group2] => Container
            [2] => Container
            [group3] => 1
            [3] => 1
        )

    [1] => Array
        (
            [0] => 40' Open Container 1
            [group1] => 40'
            [1] => 40'
            [group2] => Open Container
            [2] => Open Container
            [group3] => 1
            [3] => 1
        )

    [2] => Array
        (
            [0] => 40-45' Closed Container 3
            [group1] => 40-45'
            [1] => 40-45'
            [group2] => Closed Container
            [2] => Closed Container
            [group3] => 3
            [3] => 3
        )

    [3] => Array
        (
            [0] => container roll 10
            [group1] => 
            [1] => 
            [group2] => container roll
            [2] => container roll
            [group3] => 10
            [3] => 10
        )

    [4] => Array
        (
            [0] => container lift 50
            [group1] => 
            [1] => 
            [group2] => container lift
            [2] => container lift
            [group3] => 50
            [3] => 50
        )

)


The core regex is

(?:                               # non-capturing group
    (?P<group1>\d+(?:-\d+)?')\h*  # group1 = digits, 1+ (-other digits), optionally
)?                                # make the whole group optional
(?P<group2>(?i:[a-z]+\h?)+)\h+    # group2 = [a-zA-Z]+ horizontal whitespaces, no digits
(?P<group3>\d+(?:'')?)            # group3 = other digits + '', eventually

Upvotes: 2

Related Questions