Reputation: 507
I am trying to extract links from the summary section of a wikipedia page. I tried the below methods :
This url extracts all the links of the Deep learning
page:
https://en.wikipedia.org/w/api.php?action=query&prop=links&titles=Deep%20learning
And for extracting links associated to any section I can filter based on the section id - for e.g.,
for the Definition
section of same page I can use this url : https://en.wikipedia.org/w/api.php?action=parse&prop=links&page=Deep%20learning§ion=1
for the Overview
section of same page I can use this url : https://en.wikipedia.org/w/api.php?action=parse&prop=links&page=Deep%20learning§ion=2
But I am unable to figure out how to extract only the links from summary
section
I even tried using pywikibot to extract linkedpages and adjusting plnamespace
variable but couldn't get links only for summary section.
Upvotes: 1
Views: 751
Reputation: 333
You can use Pywikibot with the following commands
>>> import pywikibot
>>> from pwikibot import textlib
>>> site = pywikibot.Site('wikipedia:en') # create a Site object
>>> page = pywikibot.Page(site, 'Deep learning') # create a Page object
>>> sect = textlib.extract_sections(page.text, site) # divide content into sections
>>> links = sorted(link.group('title') for link in pywikibot.link_regex.finditer(sect.head))
Now links
is a list containing all link titles in alphabethical order. If you prefer Page
objects as result you may create them with
>>> pages = [pywikibot.Page(site, title) for title in links]
It's up to you to create a script with this code snippets.
Upvotes: 2
Reputation: 1721
You need to use https://en.wikipedia.org/w/api.php?action=parse&prop=links&page=Deep%20learning§ion=0
Note that this also includes links in the {{machine learning bar}} and {{Artificial intelligence|Approaches}} templates however (to the right of the screen).
Upvotes: 2