Zachary Tyler Wiggins
Zachary Tyler Wiggins

Reputation: 19

preg_match with expressions

I have two types of lines that are being pulled from RSS. Those are the games being played and the games that haven't started. The code I have below only pulls the games that have started.

$string = "Tampa Bay 6 Florida 5 (FINAL - OT)
Ottawa 0 Toronto 3 (FINAL)
NY Islanders at Ottawa (7:30 PM ET)
Toronto at Tampa Bay (7:30 PM ET)
San Jose at St. Louis (8:00 PM ET)
Detroit at Nashville (8:00 PM ET)
Vancouver at Chicago (8:30 PM ET)
Los Angeles at Edmonton (10:00 PM ET)";

preg_match_all("/^(\D+)(\d+ )(\D+)(\d+ )\s*\((.*)\)$/m", $string, $result);

echo "<pre>";

print_r($result);

The important thing is that the teams remain in the same array for both types of lines. I have no idea what to do. Any help? I'm sure its just an expression I'm forgetting.

Upvotes: 0

Views: 405

Answers (1)

kjetilh
kjetilh

Reputation: 4976

Phew. Here's what I came up with. It uses conditional patterns to match the scores if there are any, and it will sneakily "remove" the the 'at' if no scores are found (indicating it is an unstarted match). It will use the same technique for the second team's score. lastly it will match the contents of the last paranthesis group.

Captured groups:

  1. Team 1 name
  2. Team 1 score (empty if 'at')
  3. Team 2 name
  4. Team 2 score (empty if 'at')
  5. Contents of last paranthesis

    preg_match_all( '#^([a-zA-Z ]+)(?(?=\d+)(\d+)\s+|\s+at\s+)([a-zA-Z ]+)(?(?=\d+)(\d+))\s+(([^)]+))#m', $str, $matches );

    // note to edit-happy users, the regex cannot be formatted properly on SO.

The only catch is that you should trim the team name's as there might unwanted be whitespace in them.

Breakdown

  • ([a-zA-Z ]+) - matches the first team name
  • (?(?=\d+)(\d+)\s+|\s+at\s+) - uses a conditional check along with a look-ahead assertion to see if a digit is found, if yes then capture team 1's score otherwise we match but DON'T capture the at string (ninja skills:)).
  • ([a-zA-Z ]+) - matches the second team name
  • (?(?=\d+)(\d+)) - another conditional check for the second team's score
  • \s+ - match some whitespace before the remaining paranthesis bit
  • (([^)]+)) - match everything inside the paranthesis

Upvotes: 2

Related Questions