Tony Montana
Tony Montana

Reputation: 1018

PHP preg_match_all not returning desired output

I am using preg_match_all to search a specified keyword in a string and if it found then, I pick few words before and after that keyword. I am using below preg_match_all

preg_match_all('~\b(?:[^ ]+ ){0,'.$prev.'}'.trim($keyword).'(?: [^ ]+){0,'.$next.'}\b~i',$text,$output);

here $keyword is a keyword, $prev and $next are numbers representing how many words need to pick , $text is a main string and $output are the resultant array. So if my string is below

PROFIT & LOSS NOFORMING P 152 22. ADDITIONAL INFORMATION: A) AUDITORS REMUNERATION (EXCLUDING SERVICE TAX) (` in crores) ParticularsCurrent yearPrevious year As audit fees (including limited review) 3.45 2.42

Here keyword is "Audit Fee", I get desired output, like this

EXCLUDING SERVICE TAX) (` in crores) ParticularsCurrent yearPrevious year As audit fees (including limited review) 3.45 2.42

But in a below string, where if my keyword and next word have no spaces in between it just returns few words before that string but not the next word after that keyword.

PROFIT & LOSS NOFORMING P 152 22. ADDITIONAL INFORMATION: A) AUDITORS REMUNERATION (EXCLUDING SERVICE TAX) (` in crores) ParticularsCurrent yearPrevious year As audit fees(including limited review) 3.45 2.42

It just returns

EXCLUDING SERVICE TAX) (` in crores) ParticularsCurrent yearPrevious year As audit fees

Kindly guide me here, how to get next words also if my keyword and it's next word has no space in between them.

Upvotes: 1

Views: 39

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626861

If you only are worried about the words after the keyword, you need to make sure you match space characters (or non-word chars) that should be optional (zero or more):

'~\b(?:\S+\s+){0,10}Audit Fees(?:\s*\S+){0,5}\b~'

See this regex demo

This will let the whitespaces between the non-whitespace chunks after the keyword optional (\s* matches zero or more whitespaces).

Pattern details:

  • \b - leading word boundary
  • (?:\S+\s+){0,10} - zero to ten 1+ non-whitespace symbols followed with 1+ whitespaces
  • Audit Fees - literal keyword
  • (?:\s*\S+){0,5} - zero to five 0+ whitespace symbols followed with 1+ non-whitespace symbols
  • \b - trailing word boundary

PHP demo:

$prev = 10;
$keyword = "Audit Fee";
$next = 5;
$text= "PROFIT & LOSS NOFORMING P 152 22. ADDITIONAL INFORMATION: A) AUDITORS REMUNERATION (EXCLUDING SERVICE TAX) (` in crores) ParticularsCurrent yearPrevious year As audit fees(including limited review) 3.45 2.42";
$re = '~\b(?:\S+\s+){0,'.$prev.'}'.trim($keyword).'(?:\s*\S+){0,'.$next.'}\b~i';
preg_match_all($re,$text,$output);
print_r($output);

Upvotes: 1

Related Questions