Reputation: 705
Is there any better way to fetch text contents of particular sections from wikipedia. I have the below code to skip some sections but the process is taking too long to fetch data what am looking for.
for($i=0;$i>10;$i++){
if($i != 2 || $i != 4){
$url = 'http://en.wikipedia.org/w/api.php?action=parse&page=ramanagara&format=json&prop=text§ion='.$i;
$ch = curl_init($url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_USERAGENT, "TestScript");
$c = curl_exec($ch);
$json = json_decode($c);
$content = $json->{'parse'}->{'text'}->{'*'};
print preg_replace('/<\/?a[^>]*>/','',$content);
}
}
Upvotes: 0
Views: 1050
Reputation: 73014
For starters, you're telling this to loop until $i
is greater than 10
, which in practice, will loop until the server request times out. Change it to $i<10
, or if you need only a handful of sections, try:
foreach (array(1,3,5,6,7) as $i)
//your code
Second, decoding JSON into an associative array like this:
$json = json_decode($c, true);
And referencing it like $json['parse']['text']['*']
is easier to work with, but that's up to you.
And third, you'll find that strip_tags()
will likely function faster and more accurately than stripping tags with regular expressions.
Upvotes: 1