Reputation: 436
I am new in PHP and sorry if someone already answered here, I searched many posts but was unsuccessful hence asking.
I have large text block and wants output such that it should return first 250 characters and then finish it till end of sentence.
$output= preg_replace('/([^?!.]*.).*/', '\\1', substr($string, 250));
Can someone please help me in right direction? Thanks.
Upvotes: 0
Views: 2655
Reputation: 71422
There is no need for regex at all here. Simple string manipulation would be a much better solution. The problem boils down to finding the first period followed by a space after offset 249 of the string. You do not need regex to do this. You simply should be able to search for .[space]
starting at an offset of 249 in your string. A function to do this might look like this:
function get_text_blurb_to_sentence_end ($input_text, $ideal_length = 250) {
if (strlen($input_text) <= $ideal_length) {
return $input_text;
} else {
$end_of_sentence = strpos('. ', $input_text, $ideal_length - 1);
if (false === $position_of_period_space) {
// no end of sentence found just return $ideal_length characters
return substr($input_text, 0, $ideal_length);
} else {
return substr($input_text, 0, $end_of_sentence + 2);
}
}
}
Upvotes: 1
Reputation: 12341
This is not a full RegEx solution, but it may work for you
$foo = 'This is an example paragraph. It has many sentences.';
// Split the paragraph $foo into sentences
$bar = preg_split('/[.?!]/', $foo);
$bar = array_slice($bar, 0, -1);
$bas = '';
foreach ($bar as $bax) {
// Concatenate each sentence
$bas .= "$bax.";
if (strlen($bas) >= 250) {
// If the output string is longer than 250 characters
// don't concatenate any more sentences
break;
}
}
// Final paragraph
var_dump($bas);
Upvotes: 1
Reputation: 141887
Assuming you can delimit the end of a sentence by one of .
, ?
, or !
:
$output = preg_replace('/(^.{0,249}[^!?.]*.).*$/s', '$1', $string);
Updated Demo
(Added s
modifier to work with multiline strings).
Upvotes: 2
Reputation: 10302
This works:
$output = preg_replace("/^(.{250})([^\.]*\.)(.*)$/", "\\1\\2", $text);
The RegEx has three parts:
^ # Beginning of the string
(.{250}) # 250 characters of anything
([^\.]*\.) # Any number of non-periods, followed by a single period
(.*) # Anything
$ # End of the string
Then the preg_replace
just replaces the entire string with just the first two parts.
Input:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla pharetra dignissim mauris, pretium viverra justo tempus at. Mauris nisl lectus, accumsan pretium ipsum ac, fringilla vehicula tellus. Proin ante mauris, consequat sed mollis id, euismod ac turpis. Mauris tellus massa, volutpat sit amet lectus at, imperdiet mollis lacus. Praesent dapibus, lacus vel egestas convallis, magna metus pharetra mi, a fringilla odio quam eu lacus. Nulla congue quam nisi, sed posuere sapien interdum posuere. Etiam in nibh felis. Sed ac ipsum ut velit dapibus mollis. Mauris ut ante ante. Pellentesque at posuere libero, sed posuere risus.
Output:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla pharetra dignissim mauris, pretium viverra justo tempus at. Mauris nisl lectus, accumsan pretium ipsum ac, fringilla vehicula tellus. Proin ante mauris, consequat sed mollis id, euismod ac turpis.
Upvotes: 1