Reputation: 23
I'm trying to get the different Wikipedia URLs (i.e. en.wikipedia.org/wiki/Page_Name) from a Wikidata ID using the API.
For example, given the URL http://www.wikidata.org/wiki/Q7349 I want to get links to Wikipedia articles in all languages (en.wikipedia.org/wiki/Joseph_Haydn, es.wikipedia.org/wiki/Joseph_Haydn, etc). ATM I'm using https://github.com/freearhey/wikidata:
$wdAPI = new \Wikidata\Wikidata();
$resp = $wdAPI->entities('Q7349');
but then I don't know how to get the WP URLs from the object given by entities(). I suppose this should be an easy task, but after a few hours I'm still unable to figure out how to do it, and I would really appreciate if someone with previous experience using the WP API can point me in the right direction :)
Upvotes: 1
Views: 1919
Reputation: 866
I have not worked with this particular library before but its documentation is rather straight forward, so lets go through this together:
\Wikidata\Wikidata::entities()
returns a Wikidata\Entity\Entity\EntityResponse
Wikidata\Entity\Entity\EntityResponse
has a get()
method returning an array of Wikidata\Entity\Entity
Wikidata\Entity\Entity
does not seem to have any function returning you the site links to related Wikipedia pages... dead end.
Based on this, it seems like this library is not suitable (as of 14 August 2015) for your needs. It only implements basic entity data while currently only items contain sitelinks. This library also does not use the data model offered by the official wikibase/data-model
library. Using it would make things easier since it is the one being used by Wikibase, the MediaWiki extension which is actually Wikidata' underlying software. In that library you could simply use Wikibase\DataModel\Entity\Item
::getSiteLinkList()
to get a list of site links (as of version 0.4).
An alternative library, which is using the data model library mentioned above - which is also being used - would be addwiki/wikibase-api
.
There is some documentation on the GitHub repo and some more documentation on the Wikidata wiki itself ("Wikidata:Creating a bot").
From the examples on that page you can get an basic idea, reading some of the API documentation you can build the following code:
use \Mediawiki\Api as MwApi;
use \Wikibase\Api as WbApi;
use \Wikibase\DataModel\SiteLink;
$api = new MwApi\MediawikiApi( "http://www.wikidata.org/w/api.php" );
$api->login( new MwApi\ApiUser( 'USER', 'PASSWORD' ) );
$wikidata = new WbApi\WikibaseFactory( $api );
// Get the current revision of item Q7349
$revision = $wikidata->newRevisionGetter()->getFromId( 'Q7349' );
/** @var \Wikibase\DataModel\Entity\Item $item */
$item = $revision->getContent()->getData();
/** @var SiteLink $siteLink */
$itemSiteLinks = $item->getSiteLinkList();
So, $itemSiteLinks
will contain all the site links, not just to Wikipedia sites but also to Wiktionary and others. Also, we do not have the URLs yet. Unfortunately the used library does not offer a way to build the links out of the box. Instead we have to access the wikidata API directly to get the info about all sites and then build the links from that information.
/**
* @param MwApi\MediawikiApi $mwApi
* @param string[] $projectTypes The desired projects, e.g. [ "Wikipedia", "Wiktionary" ]
* @return string[] Project's ID as key, url string as value.
*/
function getProjectUrls( MwApi\MediawikiApi $mwApi, $projectTypes ) {
$urls = [];
// TODO: Could optimize this request with additional parameters:
$siteMatrix = $mwApi->postRequest( new \Mediawiki\Api\SimpleRequest( 'sitematrix' ) )[ 'sitematrix' ];
foreach( $siteMatrix as $key => $wmProjectsByLang ) {
if( !is_numeric( $key ) ) {
continue; // not a project but meta info (e.g. "count")
}
foreach( $wmProjectsByLang[ 'site' ] as $mwProject ) {
if( in_array( $mwProject[ 'sitename' ], $projectTypes ) ) {
$urls[ $mwProject[ 'dbname' ] ] = $mwProject[ 'url' ];
}
}
}
return $urls;
}
/**
* @param SiteLink $siteLink
* @param array $sitesInfo
* @return null|string
*/
function buildSiteLinkUrl( SiteLink $siteLink, array $sitesInfo ) {
$siteId = $siteLink->getSiteId();
if( !array_key_exists( $siteId, $sitesInfo ) ) {
return null;
}
$baseUrl = $sitesInfo[ $siteId ];
$titlePart = urlencode( str_replace( ' ', '_', $siteLink->getPageName() ) );
return "$baseUrl/wiki/$titlePart";
}
$wikipediaSites = getProjectUrls( $api, [ 'Wikipedia' ] );
foreach( $itemSiteLinks as $siteLink ) {
$url = buildSiteLinkUrl( $siteLink, $wikipediaSites );
if( $url !== null ) {
echo "$url\n";
}
}
This should do the job, even though the second part is kind of hacky since we created an assumption of how the wiki links are built. In theory there could be other url schemas but as far as I know, all the Wikimedia wikis follow this one.
Anyhow, for building the URLs in a perfectly secure way, there should be information about the URL schemas provided in the information returned by the sitematrix
API module but there is not.
Upvotes: 5