Harsha M V
Harsha M V

Reputation: 54949

Querying Content from Wikipedia

I am trying to fetch the first para of the Wikipedia article using the following script. When i query with multiple words it doesn't work.

<?php

$query = urlencode($_GET['query']);

$url = "http://en.wikipedia.org/w/api.php?action=parse&page=$query&format=json&prop=text&section=0";
$ch = curl_init($url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_USERAGENT, "TestScript"); // required by wikipedia.org server; use YOUR user agent with YOUR contact information. (otherwise your IP might get blocked)
$c = curl_exec($ch);

$json = json_decode($c);

$content = $json->{'parse'}->{'text'}->{'*'}; // get the main text content of the query (it's parsed HTML)

// pattern for first match of a paragraph
$pattern = '#<p>(.*)</p>#Us'; // http://www.phpbuilder.com/board/showthread.php?t=10352690
if(preg_match($pattern, $content, $matches))
{
    // print $matches[0]; // content of the first paragraph (including wrapping <p> tag)
    $cont = strip_tags($matches[1]); // Content of the first paragraph without the HTML tags.
}


$pattern = '/\[([^\[\]]|(?R))*]|\(([^()]|(?R))*\)/';
echo $my = preg_replace($pattern, '', $cont);

?>

Demo 1: Bangalore

Demo 2: Los Angeles

Is there anyway to query for the results from Wikipedia and by default select the first Result.

Upvotes: 0

Views: 110

Answers (1)

Valentin Mercier
Valentin Mercier

Reputation: 5326

You need to url encode your query string before passing it to curl.

<?php $query = urlencode($_GET['query']); ?>

EDIT: I tried your code and it worked by replacing whitespaces by the character '+'. The url encode did not work because it replaced them by '%20'.

Try this

$query = str_replace(' ', '+', $_GET['query']);

Here is the output I get with Los Angeles and New Delhi

iMac-de-Valentin:so valentin$ php so.php Los Angeles , officially the City of Los Angeles, often known by its initials L.A., is the most populous city in the U.S. state of California and the second-most populous in the United States, after New York City, with a population at the 2010 United States Census of 3,792,621. It has a land area of 469 square miles , and is located in Southern California. iMac-de-Valentin:so valentin$ php so.php New Delhi i/ˈnjuː dɛli/ is the capital of India and seat of the executive, legislative, and judiciary branches of the Government of India. It is also the centre of the Government of the National Capital Territory of Delhi. New Delhi is situated within the metropolis of Delhi and is one of the eleven districts of Delhi National Capital Territory. iMac-de-Valentin:so valentin$

Upvotes: 1

Related Questions