Tristan
Tristan

Reputation: 103

Delete mediawiki spam users via mysql

As with MediaWiki 1.31 there is no extension to mass delete spam users (only manual merge & delete). We would delete the users via MySQL, but there are warnings that this method may destroy your database because of referenced tables. When deleting user tables/rows, are there ways to make sure no references are harmed? Any experiences or recommendations?

Upvotes: 4

Views: 721

Answers (2)

Wolfgang Fahl
Wolfgang Fahl

Reputation: 15604

Today I was facing the issue with an old Mediawiki 1.23 and have searched a bit.

Based on the information above i experimented a bit.

First i wanted to assess the damage a bit:

external links

select  count(*) from externallinks

select convert(el_to using utf8) as href 
from externallinks l

There where some 150.000 external links

SQL Query to find out about users

select 
  convert(user_name using utf8) as name,
  convert(user_touched using utf8) as time,
  user_editcount 
from user 
order by 2 desc

In my case all SPAM users where create in the same time period.

SQL query with a join over page,revision,text and user table.

select 
  convert(u.user_name using utf8) as username,
  p.page_id,
  convert(p.page_title using utf8) as pagetitle,
  r.rev_user as userid,
  convert(t.old_text using utf8) as text
from page p
inner join revision r
  on p.page_id=r.rev_page
inner join user u
  on r.rev_user=u.user_id  
inner join text t
  on r.rev_text_id=t.old_id

SQL query to find number of revisions per user:

select count(*),u.user_id,convert(u.user_name using utf8) as username
from revision r
inner join user u
on r.rev_user=u.user_id
group by 2
order by 1 desc 

In my case all "good" pages where fortunately only created by one user with the user_id=1 so I could assess the damage by:

select count(*) as textcount from text where old_id in (select rev_text_id from revision where not rev_user in (1));

giving me over half a million hits as result which means deletion is better done in a stepwise way:

select count(*) as textcount from text where old_id in (select rev_text_id from revision where not rev_user in (1)); 
set autocommit=0;
start transaction;
delete from text where old_id in (select rev_text_id from revision where not rev_user in (1)) limit 2000; 
commit;

Please not that the limit of 2000 already lead to a runtime of some 2 minutes. So I'd have to run the above SQLStatement some 250 times waiting 2 minutes each ...

If you run into delete timing problems you might consider the hints in:

You can check our table status with:

show table status from <wiki-databasename>;

In my case the tables where using INNODB.

I tried increasing the innod_buffer_pool_size to 128 MByte but this didn't have a positive effect. The deletion was still slow.

I'll still try go get this finished and work my way by deleting the relevant rows in

  • externallinks
  • revision
  • page

I also checked the files in /var/lib/mysql/. Since I had innodb file per table on I saw that quite a few tables had grown very large.

So looked into

and started

optimize table text

Which took 8 hours to complete.

Fortunately in my case it's not really a production wiki. I just wanted to check the feasibility of the approach and it looks like it depends a lot on the number of rows involved.

The API and maintainance based based approaches might be much more effective depending on the concrete scenario.

Upvotes: 3

ASammour
ASammour

Reputation: 1009

Did you try this: https://www.mediawiki.org/wiki/Manual:RemoveUnusedAccounts.php?

This is a built in scipt that delete unUsed Accounts without any problems. Read the instructions in the link above.

Upvotes: -1

Related Questions