Reputation: 2246
I have a wikipedia url say(of some language but not english),
http://ru.wikipedia.org/wiki/Liz_Claiborne,_Inc
I want to convert this url to english wiki url, ie.
http://en.wikipedia.org/wiki/Liz_Claiborne,_Inc
However I am wondering what is the most effective way of doing this?
I tried searching ".wikipedia"
in the string and replaced the previous 2 chars with en
.
But what if input is simply,
http://wikipedia.org/wiki/Liz_Claiborne,_Inc
How to handle all the cases?
Hope I am clear with my question. Any help would be appreciated.
Upvotes: 0
Views: 272
Reputation: 13283
This will either change existing locales or add one if it is missing:
$urls = array(
'http://wikipedia.org',
'http://ru.wikipedia.org',
'http://en.wikipedia.org',
);
$regex = '/(?<=^http:\/\/|^https:\/\/)(?:[a-z]{2}\.|\b)(?=wikipedia.org)/i';
$change = 'de';
echo '<pre>';
foreach ($urls as $url)
echo preg_replace($regex, "$change.", $url), "\n";
die;
The problem with just changing the locale, however, is that you will get a lot of missing pages. The slug that matters is the last one, and it is different for most languages:
http://en.wikipedia.org/wiki/Internet
http://fo.wikipedia.org/wiki/Alnet
http://gv.wikipedia.org/wiki/Eddyr-voggyl
All those pages are about the "Internet", but none of them would be accessible by simply changing the locale.
Upvotes: 2
Reputation: 1165
The name of the page can vary depending on the language, so you cannot simply guess the URL.
The only way working for all pages would be to parse the wikipedia page to find the href
value of the "Other languages" links:
<li class="interwiki-en"><a href="__url__" title="__title__" hreflang="en" lang="en">English</a></li>
Upvotes: 1
Reputation: 9262
I would use a regular expression to grab the substring you are looking for. A simple working example:
<?php
$regex = '@http\://.*(wikipedia\.org/.+)@';
$url = 'http://ru.wikipedia.org/wiki/Liz_Claiborne,_Inc';
preg_match($regex, $url, $matches);
echo 'http://en.'.$matches[1];
?>
Upvotes: 1