Reputation: 189
I'm a beginner at Wikimedia, and I'm using Wiki API to finish my project. My dataset looks like this:
rev_id | comment | timestamp | page_id | page_title | user_id | user_text
-- -- -- -- -- -- -- -- -- -- -- --
352194497 | Welcome to Wikipedia | 2010-03-26T18:16:48Z | 26709696 | 116.197.206.138 | 8356162 | Mlpearc
I'm trying to find some user information of these comment posters. However, I find the "user_text" here is not the user name but the signature. If I use the official API demos get_users.py
to get the information, it turns out the error because some signature have space in it, but usernames are all single word. Like in the code below, I can get the information of Catrope and Bob using Catrope|Bob
. But it doesn't work if I use Catrope|Tide rolls
, if Tide rolls is the signature.
import requests
S = requests.Session()
URL = "https://en.wikipedia.org/w/api.php"
PARAMS = {
"action": "query",
"format": "json",
"list": "users",
"ususers": "Catrope|Tide rolls",
"usprop": "blockinfo|groups|editcount|registration|emailable|gender"
}
R = S.get(url=URL, params=PARAMS)
DATA = R.json()
USERS = DATA["query"]["users"]
for u in USERS:
print(str(u["name"]) + " has " + str(u["editcount"]) + " edits.")
So my question is, is there any way that we can get user information through the signature using API? And since we also have page_id and user_id here, will this information be helpful? Thank you so much in advance!
Updated: I used Bob Ben
here as a fake ID. Now it is replaced by a real one. Problems solved by using _ to replace space.(Thanks for the reminder from AXO.)
Upvotes: 1
Views: 264
Reputation: 9086
You've not mentioned the error and traceback that you're getting. The code sample should work fine as long as the username exists, even if the username has a space in it.
But user account "Bob Ben" is not registered. In such cases the API replies with {'name': 'Bob Ben', 'missing': ''}
.
So you're code could be:
for u in USERS:
if 'missing' not in u:
print(u["name"] + " has " + str(u["editcount"]) + " edits.")
else:
print(u["name"], "is not registered.")
BTW, if for some reason you prefer not to use space, you may use _
(underscore) instead. A blank space is equivalent with an underscore.
Regarding "user information", I'm not sure what kind of information you're looking for. According to API:Users one may get blockinfo|groups|groupmemberships|implicitgroups|rights|editcount|registration|emailable|gender|centralids|cancreate
using the usprop
parameter. But if some other information, for example the information on the user page, is to be fetched, then you'll perhaps need to use one of the methods mentioned in API:Get the contents of a page to get the contents of the user page and then write a program to look for the information you need.
Upvotes: 2