v123shine
v123shine

Reputation: 13

In paragraph making the first letter of every sentence uppercase?

I got this function from php.net for convert uppercase become lowercase in sentence case.

function sentence_case($string) {
    $sentences = preg_split('/([.?!]+)/', $string, -1, PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
    $new_string = '';
    foreach ($sentences as $key => $sentence) {
        $new_string .= ($key & 1) == 0
            ? ucfirst(strtolower(trim($sentence)))
            : $sentence . ' ';
    }
    return trim($new_string);
}

If the sentence is not in the paragraph, all works well. But if the sentence is in the paragraph, the first letter in opening paragraph (<p>) or break (<br>) tag HTML become lowercase.

This is the sample:

Before:

<p>Lorem IPSUM is simply dummy text. LOREM ipsum is simply dummy text! wHAt is LOREM IPSUM? Hello lorem ipSUM!</p>

Output:

<p>lorem ipsum is simply dummy text. Lorem ipsum is simply dummy text! What is lorem ipsum? Hello lorem ipsum!</p>

Can someone help me to make the first letter in the paragraph become capital letter?

Upvotes: 1

Views: 1247

Answers (4)

mickmackusa
mickmackusa

Reputation: 47900

When parsing valid html, it is best practice to leverage a legitimate DOM parser. Using regex is not reliable because regex does not know the difference between a tag and a substring that resembles a tag.

Code: (Demo)

$html = <<<HTML
<p>Lorem IPSUM is simply dummy text.<br>Here is dummy text. LOREM ipsum is simply dummy text! wHAt is LOREM IPSUM? Hello lorem ipSUM!</p>
HTML;

libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach($xpath->query('//text()') as $textNode) {
    $textNode->nodeValue = preg_replace_callback(
        '/(?:^|[.!?]) *\K[a-z]+/',
        function($m) {
            return ucfirst($m[0]);
        },
        strtolower($textNode->nodeValue)
    );
}
echo $dom->saveHTML();

Output:

<p>Lorem ipsum is simply dummy text.<br>Here is dummy text. Lorem ipsum is simply dummy text! What is lorem ipsum? Hello lorem ipsum!</p>

The above snippet does not:

  1. allow acronyms to remain all-caps (because the OP wants to convert all letters to lowercase before making select letters uppercase)
  2. does not bother to properly handle multibyte character (because the OP does not indicate this necessity)
  3. does not know the difference between a mid-sentence dot and a sentence-ending dot (due to ambiguity in English punctuation)

Upvotes: 0

Ankur Garg
Ankur Garg

Reputation: 605

try this

function html_ucfirst($s) {
return preg_replace_callback('#^((<(.+?)>)*)(.*?)$#', function ($c) {
        return $c[1].ucfirst(array_pop($c));
 }, $s);
}

and call this function

$string= "<p>Lorem IPSUM is simply dummy text. LOREM ipsum is simply dummy text! wHAt is LOREM IPSUM? Hello lorem ipSUM!</p>";
echo html_ucfirst($string);

here is working demo : https://ideone.com/fNq3Vo

Upvotes: 0

LSerni
LSerni

Reputation: 57408

Your problem is that you're considering HTML within the sentence, so the first "word" of the sentence is <P>lorem, not Lorem.

You can change the regexp to read /([>.?!]+)/, but this way you'll see extra spaces before "Lorem" as the system now sees two sentences and not one.

Also, now Hello <em>there</em> will be considered as four sentences.

This looks disturbingly like a case of "How can I use regexp to interpret (X)HTML"?

Upvotes: 0

yasarui
yasarui

Reputation: 6573

You can do it with CSS easily

p::first-letter {
    text-transform: uppercase;
}

Upvotes: -1

Related Questions