Reputation: 29
I'm really stuck with this one program... I'm learning how to program and I'm starting with PHP right now. I need to get titles out of articles. I already asked this question, and I mannaged to get the first title of the text in many ways. For example if text was :
Hello
I'm learning how to write this code.
:like this, so I got the "Hello" part for example like this:
<?php
$string = "Hello
I'm learning how
to write this code.";
$str=strstr($string,"\n",true);
echo $str . "<br />";
?>
However, there can be a lot of titles in the article and each one of them is seperated with blank lines from above and bellow and I cannot mannage to get all of these titles.
Here's what I tried:
<?php
$string="
Good text
Good text is good but I have no idea
how to code this.
Another title
I need to get you,
but don't know how."
$get = substr($string, strpos($string, $finda), -1);
$finda="\n";
$getFinal=strstr($get, $finda, true);
echo $getFinal;
?>
But this doesn't work because there are "\n" after every line. How to identify only those blank lines? I tried to find them:
$getRow = explode("\n", $string);
foreach($getRow as $row){
if(strlen($row) <= 1){
but I don't know what to do next. Do you have any ideas? Can you help?
Thank you in advance:)
Upvotes: 0
Views: 77
Reputation: 1017
You can use a regular expression like this:
<?php
$string="
Good text
Good text is good but I have no idea
how to code this.
Another title
I need to get you,
but don't know how.";
preg_match_all('/^\n(.+?)\n\n/m', $string, $matches);
var_dump($matches[1]);
?>
Outputs:
array(2) {
[0] =>
string(9) "Good text"
[1] =>
string(13) "Another title"
}
Regular expressions are a compact way to describe constraints for a string. Either to check that it verifies a given pattern or to capture some of its parts. In this case, we want to capture some parts of the string (titles).
'/^\n(.+?)\n\n/m'
is the regular expression used to solve your problem. The actual expression is between the slashes while the leading m
is an option. It indicates that we want to analyse multiple lines.
We are left with ^\n(.+?)\n\n
which can be read from left to right.
^
indicates the beginning of a line and \n
represents the "new line" character. Coupled (^\n
), they represent an empty line.
Parenthesis indicates what we want to capture. In this case, the title, which can be any number of any characters. The .
represents any characters and the +
indicates that we want any number of occurrences of that character (but at least one, the *
can be used to include zero occurrence). The ?
indicates that we don't want to go too far and capture the whole string. It will thus stop at the first occasion it has to match the remaining part of the regular expression.
Then, the two \n
represent the end of the title line and the end of the empty line following it.
As we used preg_match_all
instead of preg_match
, every occurrence of the pattern will be matched instead of the first one only.
Regular expressions are really powerful and I invite you to learn them further.
Upvotes: 1
Reputation: 36
While iterating over the lines, you could have a variable that stores what you are currently doing. What I mean is that you could have 3 states: processing_text, expecting_title, got_title.
Each time you find that $row == "" (meaning there was an empty line, only containing a \n), you set your variable to expecting_title. If the var==expecting_title, you store/echo the next row you encounter and set the variable to got_title. This way, when you encounter the next empty line, you won't set the variable to expecting_title, but to processing_text.
Some pseudocode to get you started:
foreach ($getRow as $row)
if (state == expecting_title)
processTitle($row)
state=got_title
if ($row == "")
if (state == processing_text)
state=expecting_title
else
state=processing_text
Or, you can always use regex, as the other answer mentioned, but that's another story.
Upvotes: 0