Vladimir Shevchenko
Vladimir Shevchenko

Reputation: 169

Get pages that are only under letters in category

When somebody opens wikipedia category, he can observe pages, that are organized alphabetically and are under the letter (A,B,C, etc.).

For example

http://en.wikipedia.org/wiki/Category:Countries_in_Europe

But there are also some related pages, that are situated under asterisk sign(*), dot(.) or just at the top. How can I extract only those pages, that are under letters?

Or may be somebody can explain what is the difference in article code or category relationships between these types of categories (between [*,.] and [A,B,C])...

Upvotes: 2

Views: 72

Answers (1)

leo
leo

Reputation: 8530

When assigning categories to MediaWiki pages, say South Sudan, you can use the syntax [[Category:Countries|Sudan]], to make it sort under Sudan rather than the default (South Sudan). On Wikipedia, this is frequently used to put the “main page” of a category on top of the category page, by adding a sortkey like *, -, or similar (characters normally used, as well as the definition of a main page, will vary depending on what Wikipedia edition you are looking at).

When asking the API for members of a category, use cmsort=sortkeyprefix to sort the results accordingly. Furthermore, you can use cmendsortkey to stop at a certain sortkey, e.g. 1 or A. Or you can print out the sortkeys, and filter the list on your side, using cmprop=sortkeyprefix: http://en.wikipedia.org/w/api.php?action=query&list=categorymembers&cmtitle=Category:Physics&cmsort=sortkey&cmprop=sortkeyprefix|title

This is all very well documented in the official MediaWiki documentation.

In the above example, the first five pages have a special sortkey (a space), to indicate that they are some kind of main pages to that category.

Upvotes: 3

Related Questions