zipzip12
zipzip12

Reputation: 41

How to find textual differences between revisions on Wikipedia pages with mwclient?

I'm trying to find the textual differences between two revisions of a given Wikipedia page using mwclient. I have the following code:

import mwclient
import difflib

site = mwclient.Site('en.wikipedia.org')
page = site.pages['Bowdoin College']
texts = [rev for rev in page.revisions(prop='content')]
if not (texts[-1][u'*'] == texts[0][u'*']):
      ##show me the differences between the pages

Thank you!

Upvotes: 2

Views: 472

Answers (1)

AXO
AXO

Reputation: 9086

It's not clear weather you want a difflib-generated diff or a mediawiki-generated diff using mwclient.

In the first case, you have two strings (the text of two revisions) and you want to get the diff using difflib:

...
t1 = texts[-1][u'*']
t2 = texts[0][u'*']
print('\n'.join(difflib.unified_diff(t1.splitlines(), t2.splitlines())))

(difflib can also generate an HTML diff, refer to the documentation for more info.)

But if you want the MediaWiki-generated HTML diff using mwclient you'll need revision ids:

# TODO: Loading all revisions is slow,
# try to load only as many as required.
revisions = list(page.revisions(prop='ids'))  
last_revision_id = revisions[-1]['revid']
first_revision_id = revisions[0]['revid']

Then use the compare action to compare the revision ids:

compare_result = site.get('compare', fromrev=last_revision_id, torev=first_revision_id)
html_diff = compare_result['compare']['*']

Upvotes: 3

Related Questions