user704988
user704988

Reputation: 436

PHP Regex to find first n characters and finish it until end of sentence

I am new in PHP and sorry if someone already answered here, I searched many posts but was unsuccessful hence asking.

I have large text block and wants output such that it should return first 250 characters and then finish it till end of sentence.

$output= preg_replace('/([^?!.]*.).*/', '\\1', substr($string, 250));

Can someone please help me in right direction? Thanks.

Upvotes: 0

Views: 2655

Answers (4)

Mike Brant
Mike Brant

Reputation: 71422

There is no need for regex at all here. Simple string manipulation would be a much better solution. The problem boils down to finding the first period followed by a space after offset 249 of the string. You do not need regex to do this. You simply should be able to search for .[space] starting at an offset of 249 in your string. A function to do this might look like this:

function get_text_blurb_to_sentence_end ($input_text, $ideal_length = 250) {
    if (strlen($input_text) <= $ideal_length) {
        return $input_text;
    } else {
        $end_of_sentence = strpos('. ', $input_text, $ideal_length - 1);
        if (false === $position_of_period_space) {
            // no end of sentence found just return $ideal_length characters
            return substr($input_text, 0, $ideal_length);
        } else {
            return substr($input_text, 0, $end_of_sentence + 2);
        }
    }
}

Upvotes: 1

federicot
federicot

Reputation: 12341

This is not a full RegEx solution, but it may work for you

$foo = 'This is an example paragraph. It has many sentences.';

// Split the paragraph $foo into sentences
$bar = preg_split('/[.?!]/', $foo);
$bar = array_slice($bar, 0, -1);

$bas = '';
foreach ($bar as $bax) {
    // Concatenate each sentence
    $bas .= "$bax.";

    if (strlen($bas) >= 250) {
        // If the output string is longer than 250 characters
        // don't concatenate any more sentences
        break;
    }
}

// Final paragraph
var_dump($bas);

Upvotes: 1

Paul
Paul

Reputation: 141887

Assuming you can delimit the end of a sentence by one of ., ?, or !:

$output = preg_replace('/(^.{0,249}[^!?.]*.).*$/s', '$1', $string);

Updated Demo

(Added s modifier to work with multiline strings).

Upvotes: 2

Thomas Kelley
Thomas Kelley

Reputation: 10302

This works:

$output = preg_replace("/^(.{250})([^\.]*\.)(.*)$/", "\\1\\2", $text);

The RegEx has three parts:

^            # Beginning of the string
(.{250})     # 250 characters of anything
([^\.]*\.)   # Any number of non-periods, followed by a single period
(.*)         # Anything
$            # End of the string

Then the preg_replace just replaces the entire string with just the first two parts.

Input:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla pharetra dignissim mauris, pretium viverra justo tempus at. Mauris nisl lectus, accumsan pretium ipsum ac, fringilla vehicula tellus. Proin ante mauris, consequat sed mollis id, euismod ac turpis. Mauris tellus massa, volutpat sit amet lectus at, imperdiet mollis lacus. Praesent dapibus, lacus vel egestas convallis, magna metus pharetra mi, a fringilla odio quam eu lacus. Nulla congue quam nisi, sed posuere sapien interdum posuere. Etiam in nibh felis. Sed ac ipsum ut velit dapibus mollis. Mauris ut ante ante. Pellentesque at posuere libero, sed posuere risus.

Output:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla pharetra dignissim mauris, pretium viverra justo tempus at. Mauris nisl lectus, accumsan pretium ipsum ac, fringilla vehicula tellus. Proin ante mauris, consequat sed mollis id, euismod ac turpis.

https://eval.in/44807

Upvotes: 1

Related Questions