Haradzieniec
Haradzieniec

Reputation: 9338

Get first paragraph of article from Wikipedia

What is the proper way to get a first paragraph of the article for the word Hollywood from Wikipedia? As result, the $result variable contains the first paragraph of the page

Hollywood is a district in Los Angeles, California, United States situated west-northwest of downtown Los Angeles.[2] Due to its fame and cultural identity as the historical center of movie studios and movie stars, the word Hollywood is often used as a metonym of American cinema. Even though much of the movie industry has dispersed into surrounding areas such as West Los Angeles and the San Fernando and Santa Clarita Valleys, significant auxiliary industries, such as editing, effects, props, post-production, and lighting companies remain in Hollywood, as does the backlot of Paramount Pictures.

It is OK if it contains HTML tags (even better than just the plain text).

Upvotes: 0

Views: 1917

Answers (2)

svick
svick

Reputation: 244948

I have no idea what Kohana is, but to get the HTML text of a certain Wikipedia page, you can use the API.

For example, to get the HTML of the first section of the Hollywood article, you would use a query like:

http://en.wikipedia.org/w/api.php?format=xml&action=query&prop=revisions&titles=Hollywood&rvprop=content&rvsection=0&rvparse

This is in XML format, but JSON is also an option.

Also, this returns the whole first section (including the infobox), not just the first paragraph.

Upvotes: 3

Armon
Armon

Reputation: 358

You can use the Simple HTML DOM library to easily parse HTML from webpages:

include('inc/simple_html_dom.php'); // this line should be replaced with the Kohana way of including the library

// Create DOM from URL
$html = file_get_html('http://en.wikipedia.org/wiki/Hollywood');

// Get the first paragraph
$p = $html->find('p', 0);

echo $p->innertext; // Prints <b>Hollywood</b> is a district in (...)

I've never used Kohana but there seem to be at least 2 Kohana modules for Simple HTML DOM, so it should be easy to use the library in your project.

Upvotes: 1

Related Questions