Reputation: 946
Basically I'm taking in a paragraph filled with all kinds of punctuation such as ! ? . ; " and splitting them into sentences. The issues I'm facing is coming up with a way to split them into sentences with punctuation intact while at the same time accounting for quotations in dialogue
For instance the paragraph:
One morning, when Gregor Samsa woke from troubled dreams, he found himself transformed in his bed into a horrible vermin. "What has happened!?" he asked himself. "I... don't know." said Samsa, "Maybe this is a bad dream." He lay on his armour-like back, and if he lifted his head a little he could see his brown belly, slightly domed and divided by arches into stiff sections.
Would need to be split up like this
[0] One morning, when Gregor Samsa woke from troubled dreams, he found himself transformed in his bed into a horrible vermin.
[1] "What has happened!?" he asked himself.
[2] "I... don't know." said Samsa, "Maybe this is a bad dream."
And so on.
Currently I am just using explode
$sentences = explode(".", $sourceWork);
and only splitting it up by the periods and appending one at the end. Which I know is far from what I want but I'm not quite sure where to even start handling this. If someone could at least point me the right direction of where to look for ideas that would be amazing.
Thanks in advance!
Upvotes: 2
Views: 1673
Reputation: 309
you need to manually go through your String and do explodes. Keep track of quotation count, if it is odd number do not break, here is a simple idea:
<?
//$str = 'AAA. BBB. "CCC." DDD. EEE. "FFF. GGG. HHH".';
$str = 'One morning, when Gregor Samsa woke from troubled dreams, he found himself transformed in his bed into a horrible vermin. "What has happened!?" he asked himself. "I... don\'t know." said Samsa, "Maybe this is a bad dream." He lay on his armour-like back, and if he lifted his head a little he could see his brown belly, slightly domed and divided by arches into stiff sections.';
$last_dot=0;
$quotation=0;
$explode_list = Array();
for($i=0;$i < strlen($str);$i++)
{
$char = substr($str,$i,1);//get the currect character
if($char == '"') $quotation++;//track quotation
if($quotation%2==1) continue;//nothing to do so go back
if($char == '.')
{
echo "char is $char $last_dot<br/>";
$explode_list[]=(substr($str,$last_dot,$i+1-$last_dot));
$last_dot = $i+1;
}
}
echo "testing:<pre>";
print_r($explode_list);;
Upvotes: 0
Reputation: 174957
Here's what I have:
<?php
/**
* @param string $str String to split
* @param string $end_of_sentence_characters Characters which represent the end of the sentence. Should be a string with no spaces (".,!?")
*
* @return array
*/
function split_sentences($str, $end_of_sentence_characters) {
$inside_quotes = false;
$buffer = "";
$result = array();
for ($i = 0; $i < strlen($str); $i++) {
$buffer .= $str[$i];
if ($str[$i] === '"') {
$inside_quotes = !$inside_quotes;
}
if (!$inside_quotes) {
if (preg_match("/[$end_of_sentence_characters]/", $str[$i])) {
$result[] = $buffer;
$buffer = "";
}
}
}
return $result;
}
$str = <<<STR
One morning, when Gregor Samsa woke from troubled dreams, he found himself transformed in his bed into a horrible vermin. "What has happened!?" he asked himself. "I... don't know." said Samsa, "Maybe this is a bad dream." He lay on his armour-like back, and if he lifted his head a little he could see his brown belly, slightly domed and divided by arches into stiff sections.
STR;
var_dump(split_sentences($str, "."));
Upvotes: 3
Reputation: 2000
preg_split('/[.?!]/',$sourceWork);
it's very simple regular expression, but i think you task is impossible.
Upvotes: 0