Reputation: 103
As with MediaWiki 1.31 there is no extension to mass delete spam users (only manual merge & delete). We would delete the users via MySQL, but there are warnings that this method may destroy your database because of referenced tables. When deleting user tables/rows, are there ways to make sure no references are harmed? Any experiences or recommendations?
Upvotes: 4
Views: 721
Reputation: 15604
Today I was facing the issue with an old Mediawiki 1.23 and have searched a bit.
Based on the information above i experimented a bit.
First i wanted to assess the damage a bit:
external links
select count(*) from externallinks
select convert(el_to using utf8) as href
from externallinks l
There where some 150.000 external links
SQL Query to find out about users
select
convert(user_name using utf8) as name,
convert(user_touched using utf8) as time,
user_editcount
from user
order by 2 desc
In my case all SPAM users where create in the same time period.
SQL query with a join over page,revision,text and user table.
select
convert(u.user_name using utf8) as username,
p.page_id,
convert(p.page_title using utf8) as pagetitle,
r.rev_user as userid,
convert(t.old_text using utf8) as text
from page p
inner join revision r
on p.page_id=r.rev_page
inner join user u
on r.rev_user=u.user_id
inner join text t
on r.rev_text_id=t.old_id
SQL query to find number of revisions per user:
select count(*),u.user_id,convert(u.user_name using utf8) as username
from revision r
inner join user u
on r.rev_user=u.user_id
group by 2
order by 1 desc
In my case all "good" pages where fortunately only created by one user with the user_id=1 so I could assess the damage by:
select count(*) as textcount from text where old_id in (select rev_text_id from revision where not rev_user in (1));
giving me over half a million hits as result which means deletion is better done in a stepwise way:
select count(*) as textcount from text where old_id in (select rev_text_id from revision where not rev_user in (1));
set autocommit=0;
start transaction;
delete from text where old_id in (select rev_text_id from revision where not rev_user in (1)) limit 2000;
commit;
Please not that the limit of 2000 already lead to a runtime of some 2 minutes. So I'd have to run the above SQLStatement some 250 times waiting 2 minutes each ...
If you run into delete timing problems you might consider the hints in:
You can check our table status with:
show table status from <wiki-databasename>;
In my case the tables where using INNODB.
I tried increasing the innod_buffer_pool_size to 128 MByte but this didn't have a positive effect. The deletion was still slow.
I'll still try go get this finished and work my way by deleting the relevant rows in
I also checked the files in /var/lib/mysql/. Since I had innodb file per table on I saw that quite a few tables had grown very large.
So looked into
and started
optimize table text
Which took 8 hours to complete.
Fortunately in my case it's not really a production wiki. I just wanted to check the feasibility of the approach and it looks like it depends a lot on the number of rows involved.
The API and maintainance based based approaches might be much more effective depending on the concrete scenario.
Upvotes: 3
Reputation: 1009
Did you try this: https://www.mediawiki.org/wiki/Manual:RemoveUnusedAccounts.php?
This is a built in scipt that delete unUsed Accounts without any problems. Read the instructions in the link above.
Upvotes: -1