Sarah
Sarah

Reputation: 13051

PHP, Regular expression to parse data

I have data in the format:

Football - 101 Carolina Panthers +15 -110 for Game

Football - 101 Carolina Panthers/Pittsburgh Steelers under 36½ -110 for Game

Football - 102 Pittsburgh Steelers -9 -120 for 1st Half


How to transform this into a PHP array:

$game_data[] = array( 'sport_type'  => 'Football',
                      'game_number' => 101,
                      'game_name'   => 'Carolina Panthers',
                      'runline_odd' => '+15 -110',
                      'total_odd'   => '',
                      'odd_type'    => 'runline',
                      'period'      => 'Game' );

$game_data[] = array( 'sport_type'  => 'Football',
                      'game_number' => 101,
                      'game_name'   => 'Carolina Panthers/Pittsburgh Steelers',
                      'runline_odd' => '',
                      'total_odd'   => 'under 36½ -110',
                      'odd_type'    => 'total_odd',
                      'period'      => 'Game' );

$game_data[] = array( 'sport_type'  => 'Football',
                      'game_number' => 102,
                      'game_name'   => 'Pittsburgh Steelers',
                      'runline_odd' => '-9 -120',
                      'total_odd'   => '',
                      'odd_type'    => 'runline',
                      'period'      => '1st Half' );

Upvotes: 0

Views: 212

Answers (2)

nikc.org
nikc.org

Reputation: 16952

Normally I wouldn't solve the whole problem for someone, but the ½ character made it interesting enough. Now, I'm not a super expert on regexes so this might not be the most optimized or elegant solution, but it seems to get the job done. At least with the provided sample input.

EDIT: Oops. Didn't catch that under was actually part of the runline_odd data. So this does actually not currently get the job done. I'll be back.

EDIT2: Revised the regex slightly and it now correctly matches between runline_odd and runline_total.

<?php
$input = array(
'Football - 101 Carolina Panthers +15 -110 for Game',
'Football - 101 Carolina Panthers/Pittsburgh Steelers under 36½ -110 for Game',
'Football - 102 Pittsburgh Steelers -9 -120 for 1st Half'
);

$regex = '^(?<sport_type>[[:alpha:]]*) - '.
         '(?<game_number>[0-9]*) '.
         '('.
            '(?<game_nameb>[[:alpha:]\/ ]*?) '.
            '(?<runline_total>(under ([0-9\x{00BD}]+){1}) ((-|\+)?([-+0-9\x{00BD}]+){1})) for '.
         '|'.
            '(?<game_namea>[[:alpha:]\/ ]*) '.
            '(?<runline_odd>((-|\+)?([0-9\x{00BD}]+){1}) ((-|\+)?([-+0-9\x{00BD}]+){1})) for '.
         ')'.
         '(?<period>.*)$';


$game_data = array();

foreach ($input as $in) {
    $matches = false;
    $cnt = preg_match('/' . $regex . '/ui', $in, $matches);

    if ($cnt && is_array($matches) && count($matches)) {
        if (empty($matches['game_nameb'])) {
            $game_name = $matches['game_namea'];
            $runline_odd = $matches['runline_odd'];
            $total_odd = '';
        } else {
            $game_name = $matches['game_nameb'];
            $runline_odd = '';
            $total_odd = $matches['runline_total'];
        }


        $result = array(
            'sport_type' => $matches['sport_type'],
            'game_number' => $matches['game_number'],
            'game_name' => $game_name,
            'runline_odd' => $runline_odd,
            'total_odd' => $total_odd,
            'period' => $matches['period']
        );

        array_push($game_data, $result);
    }
}

var_dump($game_data);

This produces the following:

$ /usr/local/bin/php preg-match.php 
array(3) {
[0]=>
  array(6) {
    ["sport_type"]=>
    string(8) "Football"
    ["game_number"]=>
    string(3) "101"
    ["game_name"]=>
    string(17) "Carolina Panthers"
    ["runline_odd"]=>
    string(8) "+15 -110"
    ["total_odd"]=>
    string(0) ""
    ["period"]=>
    string(4) "Game"
  }
  [1]=>
  array(6) {
    ["sport_type"]=>
    string(8) "Football"
    ["game_number"]=>
    string(3) "101"
    ["game_name"]=>
    string(37) "Carolina Panthers/Pittsburgh Steelers"
    ["runline_odd"]=>
    string(0) ""
    ["total_odd"]=>
    string(15) "under 36½ -110"
    ["period"]=>
    string(4) "Game"
  }
  [2]=>
  array(6) {
    ["sport_type"]=>
    string(8) "Football"
    ["game_number"]=>
    string(3) "102"
    ["game_name"]=>
    string(19) "Pittsburgh Steelers"
    ["runline_odd"]=>
    string(7) "-9 -120"
    ["total_odd"]=>
    string(0) ""
    ["period"]=>
    string(8) "1st Half"
  }
}

Upvotes: 1

Vikas
Vikas

Reputation: 8948

Following works except the case where there is an under after gmae name:

/([^-]+)\s*-\s*(\d+)\s*([^\d+-]+)\s*((?:under\s*)?[\d\s+-]+)\s*for\s*(.+)/

Explanation:

([^-]+): Match anything other than -, which is separating gmae name from other details.
\s*-\s*: - surrounded with spaces
(\d+)  : Game number
([^\d+-]+): Anything other than +, -, a digit. Matches gmae name.
((?:under\s*)?[\d\s+-]+): runline odd or total odd.

PS:

  1. Take care of the cases where there is 'under'. The regex above is dumping it with game_name.
  2. Take care of unicode chars.

Upvotes: 1

Related Questions