Mophilly
Mophilly

Reputation: 143

How to compare two MediaWiki sites

We moved a private MediaWiki site to a new server. Some months later we discovered that one or two users had continued to update the old MediaWiki site. So we have some edits in the old server that need to be copied into the new server.

Does anyone know of a routine or process to (conveniently?) compare and identify edits in the old site?

Per the comments attached to this post, the Recent Changes page might work if that page accepted a starting date. Unfortunately, it is limited to a max of 30 days. In this case, I need to review changes for 12 months.

Upvotes: 1

Views: 94

Answers (1)

Rainer Rillke
Rainer Rillke

Reputation: 1321

Identify edits done

Identify and verify edits done by your users since the fork

Using the database (assuming MySQL) and no table prefixes

Give me all the edits done since Dec 01 2018 (including that date):

SELECT rev_id, rev_page, rev_text_id, rev_comment, rev_user, rev_user_text, rev_timestamp
FROM   revision
WHERE  rev_timestamp > '20181201';

Note that the actual page text is stored in the text table, and the page name in the page table.

Give me all edits done since Dec 01 2018 (including that date) with page name and revision text:

SELECT rev_id, rev_page, page_namespace, page_title, rev_text_id, rev_comment, rev_user, rev_user_text, rev_timestamp, old_text
FROM revision r
LEFT JOIN page p
   ON p.page_id = r.rev_page
LEFT JOIN text t
   ON t.old_id = r.rev_text_id
WHERE rev_timestamp > '20181201';

Note that with tools like MySQL Workbench you can copy results as MySQL insert statements. Dependent on what users did to the old wiki, you might just need to transfer records of 3 tables; however if there were file uploads, deletions or user right changes involved, it's getting complicated. You can track these changes through the logging table.

Using the Web Interface

It is of course possible to show more changes than just 500 for the last 30 days. The setting that allow you to configure this is $wgRCLinkLimits and $wgRCLinkDays. You can also just open the recent changes page, tap 30 days and change the URL parameters so the URL becomes path/to/index.php?title=Special:RecentChanges&days=90&limit=1500 (limit of 1500 within the last 90 days).

The length that recent changes history is retained for depends on $wgRCMaxAge. It is currently 90 days but you might be in luck if the purge job didn't yet delete older entries.

Logs can be viewed without that limitation. Visit Special:Log in your wiki.

Using the API

list=allrevisions lists all page revisions (i.e. changes).

It allows specifying start timestamps (arvstart) and continuation.

Example: https://commons.wikimedia.org/w/api.php?action=query&list=allrevisions&arvlimit=1000

To see deletions, user right changes, uploads, ... use list=logevents.

Fix the issue

Either using database scripts (don't forget to back-up before) or with Special:Export in the source wiki and Special:Import in the Wiki in need of an update.

Avoid the issue

For a future migration to a new server $wgReadOnly might be your friend, avoiding this issue in the first place by making the old wiki read-only.

There is also Extension:Sync, though I am not sure what it is capable of.

Upvotes: 1

Related Questions