Abuda Dumiaty
Abuda Dumiaty

Reputation: 317

Extract fragments of text via PHP and REGEXP

Assuming I have the string variable:

$str = '
[WhiteTitle "GM"]
[WhiteCountry "Cuba"]
[BlackCountry "United States"]

1. d4 d5 2. Nf3 Nf6 3. e3 c6 4. c4 e6 5. Nc3 Nbd7 6. Bd3 Bd6
7. O-O O-O 8. e4 dxe4 9. Nxe4 Nxe4 10. Bxe4 Nf6 11. Bc2 h6
12. b3 b6 13. Bb2 Bb7 14. Qd3 g6 15. Rae1 Nh5 16. Bc1 Kg7
17. Rxe6 Nf6 18. Ne5 c5 19. Bxh6+ Kxh6 20. Nxf7+ 1-0
';

I would like to extract some information from that variable into an array that looks like this:

Array {
    ["WhiteTitle"] => "GM",
    ["WhiteCountry"] => "Cuba",
    ["BlackCountry"] => "United States"
}

Thanks.

Upvotes: 0

Views: 93

Answers (3)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626748

Here is a safer and more compact solution:

$re = '~\[([^]["]*?)\s*"([^]"]+)~';   // Defining the regex
$str = "[WhiteTitle \"GM\"]\n[WhiteCountry \"Cuba\"]\n[BlackCountry \"United States\"]\n\n1. d4 d5 2. Nf3 Nf6 3. e3 c6 4. c4 e6 5. Nc3 Nbd7 6. Bd3 Bd6\n7. O-O O-O 8. e4 dxe4 9. Nxe4 Nxe4 10. Bxe4 Nf6 11. Bc2 h6\n12. b3 b6 13. Bb2 Bb7 14. Qd3 g6 15. Rae1 Nh5 16. Bc1 Kg7\n17. Rxe6 Nf6 18. Ne5 c5 19. Bxh6+ Kxh6 20. Nxf7+ 1-0"; 
preg_match_all($re, $str, $matches);  // Getting all matches
print_r(array_combine($matches[1],$matches[2])); // Creating the final array with array_combine

See IDEONE PHP demo, and a regex demo.

Regex details:

  • \[ - opening [
  • ([^]["]*?) - Group 1 matching 0+ characters other than ", [ and ], as few as possible up to
  • \s* - 0+ whitespaces (to trim the first value)
  • " - a double quote
  • ([^]"]+) - Group 2 matching 1+ characters other than ] and "

Upvotes: 1

Pedro Lobito
Pedro Lobito

Reputation: 98881

You can use something like;:

<?php
$string = <<< EOF
[WhiteTitle "GM"]
[WhiteCountry "Cuba"]
[BlackCountry "United States"]
1. d4 d5 2. Nf3 Nf6 3. e3 c6 4. c4 e6 5. Nc3 Nbd7 6. Bd3 Bd6
7. O-O O-O 8. e4 dxe4 9. Nxe4 Nxe4 10. Bxe4 Nf6 11. Bc2 h6
12. b3 b6 13. Bb2 Bb7 14. Qd3 g6 15. Rae1 Nh5 16. Bc1 Kg7
17. Rxe6 Nf6 18. Ne5 c5 19. Bxh6+ Kxh6 20. Nxf7+ 1-0
EOF;

$final = array();
preg_match_all('/\[(.*?)\s+(".*?")\]/', $string, $matches, PREG_PATTERN_ORDER);
for($i = 0; $i < count($matches[1]); $i++) {
    $final[$matches[1][$i]] = $matches[2][$i];
}

print_r($final);

Output:

Array
(
    [WhiteTitle] => "GM"
    [WhiteCountry] => "Cuba"
    [BlackCountry] => "United States"
)

Ideone Demo:

http://ideone.com/wQYshT


Regex Explanation:

\[(.*?)\s+(".*?")\]

Match the character “[” literally «\[»
Match the regex below and capture its match into backreference number 1 «(.*?)»
   Match any single character that is NOT a line break character (line feed) «.*?»
      Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match a single character that is a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «\s+»
   Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the regex below and capture its match into backreference number 2 «(".*?")»
   Match the character “"” literally «"»
   Match any single character that is NOT a line break character (line feed) «.*?»
      Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
   Match the character “"” literally «"»
Match the character “]” literally «\]»

Upvotes: 1

Ghulam Ali
Ghulam Ali

Reputation: 1935

You can use:

preg_match_all('/\[(.*?) "(.*?)"\]/m', $str, $matches, PREG_SET_ORDER);
print_r($matches);

It will give you all the matches in array, 0 key will be complete match, 1st key will be the first part, and 2nd key will be second part:

Output:

Array
(
    [0] => Array
        (
            [0] => [WhiteTitle "GM"]
            [1] => WhiteTitle
            [2] => GM
        )

    [1] => Array
        (
            [0] => [WhiteCountry "Cuba"]
            [1] => WhiteCountry
            [2] => Cuba
        )

    [2] => Array
        (
            [0] => [BlackCountry "United States"]
            [1] => BlackCountry
            [2] => United States
        )
)

If you want it in the format you asked you can use simple looping for this:

$array = array();
foreach($matches as $match){
    $array[$match[1]] = $match[2];
}
print_r($array);

Output:

Array
(
    [WhiteTitle] => GM
    [WhiteCountry] => Cuba
    [BlackCountry] => United States
)

Upvotes: 1

Related Questions