Getting titles out of string

Question

I'm really stuck with this one program... I'm learning how to program and I'm starting with PHP right now. I need to get titles out of articles. I already asked this question, and I mannaged to get the first title of the text in many ways. For example if text was :

Hello

I'm learning how to write this code.

:like this, so I got the "Hello" part for example like this:

    ";
    ?>

However, there can be a lot of titles in the article and each one of them is seperated with blank lines from above and bellow and I cannot mannage to get all of these titles.

Here's what I tried:

But this doesn't work because there are " " after every line. How to identify only those blank lines? I tried to find them:

    $getRow = explode("
", $string);
    foreach($getRow as $row){ 
    if(strlen($row) <= 1){

but I don't know what to do next. Do you have any ideas? Can you help?

Thank you in advance:)

olivier · Accepted Answer

You can use a regular expression like this:

Outputs:

array(2) {
  [0] =>
  string(9) "Good text"
  [1] =>
  string(13) "Another title"
}

Explanation of the regular expression

Regular expressions are a compact way to describe constraints for a string. Either to check that it verifies a given pattern or to capture some of its parts. In this case, we want to capture some parts of the string (titles).

'/^ (.+?) /m' is the regular expression used to solve your problem. The actual expression is between the slashes while the leading m is an option. It indicates that we want to analyse multiple lines.

We are left with ^ (.+?) which can be read from left to right.

^ indicates the beginning of a line and represents the "new line" character. Coupled (^), they represent an empty line.

Parenthesis indicates what we want to capture. In this case, the title, which can be any number of any characters. The . represents any characters and the + indicates that we want any number of occurrences of that character (but at least one, the * can be used to include zero occurrence). The ? indicates that we don't want to go too far and capture the whole string. It will thus stop at the first occasion it has to match the remaining part of the regular expression.

Then, the two represent the end of the title line and the end of the empty line following it.

As we used preg_match_all instead of preg_match, every occurrence of the pattern will be matched instead of the first one only.

Regular expressions are really powerful and I invite you to learn them further.

Getting titles out of string

Answers (2)

Explanation of the regular expression

Related Questions