Reputation: 169
When somebody opens wikipedia category, he can observe pages, that are organized alphabetically and are under the letter (A,B,C, etc.).
For example
http://en.wikipedia.org/wiki/Category:Countries_in_Europe
But there are also some related pages, that are situated under asterisk sign(*), dot(.) or just at the top. How can I extract only those pages, that are under letters?
Or may be somebody can explain what is the difference in article code or category relationships between these types of categories (between [*,.] and [A,B,C])...
Upvotes: 2
Views: 72
Reputation: 8530
When assigning categories to MediaWiki pages, say South Sudan, you can use the syntax [[Category:Countries|Sudan]]
, to make it sort under Sudan rather than the default (South Sudan). On Wikipedia, this is frequently used to put the “main page” of a category on top of the category page, by adding a sortkey like *
, -
, or similar (characters normally used, as well as the definition of a main page, will vary depending on what Wikipedia edition you are looking at).
When asking the API for members of a category, use cmsort=sortkeyprefix
to sort the results accordingly. Furthermore, you can use cmendsortkey
to stop at a certain sortkey, e.g. 1
or A
. Or you can print out the sortkeys, and filter the list on your side, using cmprop=sortkeyprefix
: http://en.wikipedia.org/w/api.php?action=query&list=categorymembers&cmtitle=Category:Physics&cmsort=sortkey&cmprop=sortkeyprefix|title
This is all very well documented in the official MediaWiki documentation.
In the above example, the first five pages have a special sortkey (a space), to indicate that they are some kind of main pages to that category.
Upvotes: 3