Vishnu Simbu
Vishnu Simbu

Reputation: 175

Unable to solve this Complex Regex

I have some examples like below. What I need is to remove the timing part in particular line. For example from:

Uefa Europa League Simulcast 8.00am-10.00Am

I only need:

Uefa Europa League Simulcast

I am new to regex. Can someone please guide me.

Below are some example strings I am working with:

Upvotes: 1

Views: 102

Answers (3)

Déjà vu
Déjà vu

Reputation: 28850

<?php

$str = array("Uefa Europa League Simulcast 8.00am-10.00Am",
"MLB: Chicago White Sox at Los Angeles Angels 8.00PM-11.00AM",
"MLB Spring: Chicago Cubs at Los Angeles Dodgers 8.00aM-11.00PM",
"Live Test Cricket New Zealand v England: Second Test Day Two 9:00am-5:00am",
"UEFA Europa League Hightlights 10.00pm-11.00pm");


foreach ($str as $s) {
   $new = preg_replace('/\s*\d?\d[\.\:]\d\d[ap]m-\d?\d[\.\:]\d\d[ap]m\s*$/i', '', $s);
   echo "$new\n";
}

Outputs

Uefa Europa League Simulcast
MLB: Chicago White Sox at Los Angeles Angels
MLB Spring: Chicago Cubs at Los Angeles Dodgers
Live Test Cricket New Zealand v England: Second Test Day Two
UEFA Europa League Hightlights

Explanations of /\s*\d?\d[\.\:]\d\d[ap]m-\d?\d[\.\:]\d\d[ap]m\s*$/i

  • \s* means zero or more whitespaces
  • \d? means one or zero digit
  • \d means a digit
  • [\.\:] means . or :
  • [ap] means a or p
  • $ means end of string
  • /.../i means case insensitive

Upvotes: 5

DaveRandom
DaveRandom

Reputation: 88707

Regular expressions match only regular strings, so lets break down the example strings you have provided and look for some regularity:

8.00am-10.00Am
8.00PM-11.00AM
8.00PM-11.00AM
8.00PM-11.00AM
10.00pm-11.00pm

So I can see that:

  • it always starts with either one or two digits. This can be expressed as \d{1,2}
  • this is followed by either a dot . or a colon :. This can be expressed as [.:]
  • this is always followed by 2 digits: \d{2}
  • then either the string AM or PM, case insensitive. To use case-insensitivity, we can use the i modifier for the entire pattern, so we just need to match am or pm, written as (am|pm). This will behave as a capturing group, and we don't need to capture, so we can optimise the expression with (?:am|pm), a non-capturing group
  • then a dash -, this can be expressed literally
  • then the first part of the pattern is repeated.

So when we put it all together, we get:

/\d{1,2}[.:]\d{2}(?:am|pm)-\d{1,2}[.:]\d{2}(?:am|pm)/i

But wait... this will leave trailing whitespace on the end of the subject string. Well, we could just trim() this off, but we can also get the regular expression to handle this for us, by prepending \s* to the pattern.

/\s*\d{1,2}[.:]\d{2}(?:am|pm)-\d{1,2}[.:]\d{2}(?:am|pm)/i

So now we just need to replace anything that matches this pattern with an empty string. You can do this in PHP with preg_replace():

$pattern = '/\s*\d{1,2}[.:]\d{2}(?:am|pm)-\d{1,2}[.:]\d{2}(?:am|pm)/i';
$string = 'Uefa Europa League Simulcast 8.00am-10.00Am';

$string = preg_replace($pattern, '', $string);

var_dump($string);

See it working

Upvotes: 1

Dennis
Dennis

Reputation: 32608

No need for a regular expression - you can just trim the string at the last space:

<?php

$str = array(
    "Uefa Europa League Simulcast 8.00am-10.00Am",
    "MLB: Chicago White Sox at Los Angeles Angels 8.00PM-11.00AM",
    "MLB Spring: Chicago Cubs at Los Angeles Dodgers 8.00aM-11.00PM",
    "Live Test Cricket New Zealand v England: Second Test Day Two 9:00am-5:00am",
    "UEFA Europa League Hightlights 10.00pm-11.00pm"
);

foreach ($str as $s) {
    $trimmed = substr($s, 0, strrpos($s, ' '));
    echo "$trimmed\n";
}

?>

Upvotes: 1

Related Questions