Tom Harris
Tom Harris

Reputation: 11

Use regex to find a phrase UP TO 16 characters? (php)

Little problem I'm facing. I've got a long string that has many words in it, and I'm trying to split it up, but most parts of the string have a start and end to reference to that is static, however this one only has an end, and the actual bit of string I'm trying to get is dynamic, but it is up to 16 characters, it could be less and the amount of words in the phrase is unknown.

Example:

Name: John Smith Occupation: Doctor Currently Busy Gender: Male 

I want to get "Currently Busy" on it's own without getting the end of the other string before.

But I also want to use the same code to get "Not Yet Here" from this string:

Name: John Smith Occupation: Doctor Not Yet Here Gender: Male 

I can't find a way, and I don't even know if it's possible so hopefully someone here could help me out.

Upvotes: 1

Views: 156

Answers (2)

HamZa
HamZa

Reputation: 14931

Not the most elegant way, but here's a solution:

$string = 'Name: John Smith Occupation: Doctor Currently Busy Gender: Male';
$groups = array_filter(preg_split('/\s?\w+:\s?/', $string));
// Split by [\s? => optional space][\w+ => characters a-zA-Z0-9_][:][\s? => optional space]

// $groups[2] contains 'Doctor Currently Busy'
$pieces = explode(' ', $groups[2]);
$pieces = array_reverse($pieces);
$length = 0;$i = 0;$c = count($pieces);$result = array(); // We need this for the loop
// $c and $i are to preserve the first word if the length of all words are < 16 !

foreach($pieces as $piece){
    $length += strlen($piece);
    $i++;
    if($length <= 16 && $c != $i){
        $result[] = $piece;
    }else{
        break;
    }
}

$result = array_reverse($result);
$final_result = implode(' ', $result);
echo $final_result; // Currently Busy

Upvotes: 0

Thomas Kelley
Thomas Kelley

Reputation: 10302

Your problem is one that RegEx may not be able to solve. If the value of "occupation" can be one or more words, and it's directly succeeded by another value that could be one or more words, then how would you tell the two phrases apart, as a human?

I'm hoping that at the very least, you have a set of known Occupation values. If that's the case, then you can craft your expression like this:

(?<=Doctor |Nurse ).*(?= Gender)

The (?<=...) and (?=...) bits are lookbehind and lookahead assertions that essentially say "make sure that the expression Doctor |Nurse appears before the matched phrase (but do not match that part of it), and that the expression Gender appears after the matched phrase (but also do not match that part of it)."

See this in action: http://regexr.com?34buq

Upvotes: 1

Related Questions