SanMelkote
SanMelkote

Reputation: 238

How to parse Wikipedia talk page content by contributor?

I am looking to parse the Wikipedia talk page (e.g., https://en.wikipedia.org/wiki/Talk:Elon_Musk). I would like to loop through texts by contributors/editors. Not sure how do I do it. For now, I have the following code:

import pywikibot as pw
wikiPage="elon_musk"
page = pw.Page(pw.Site('en'), wikiPage)
talkpage = page.toggleTalkPage()
s=talkpage.text 
cs=talkpage.contributors()

It seems pretty hard to parse the text (i.e., s) and find the talk text made by each contributor. Not sure where the talk begins and ends for a contributor and what talk text is in response to a talk text made by others. Is there a way that talk page returns segments that I can loop through?

Many thanks for your help!

Upvotes: 0

Views: 324

Answers (1)

smartse
smartse

Reputation: 1721

I don't know about pywikibot, but you can do this via the normal API. This will fetch the revisions: https://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Talk:Elon%20Musk&rvlimit=500&rvprop=timestamp|user|comment|ids

Then you can pass the revision ids to get the change in each edit: e.g. https://en.wikipedia.org/w/api.php?action=compare&fromrev=944235185&torev=944237256

Upvotes: 1

Related Questions