Laz22434
Laz22434

Reputation: 373

Get all titles from Wikipedia with python

i need to get all titles from italian wikipedia. I wrote already this code:

import requests
  
 S = requests.Session()

    URL = "https://it.wikipedia.org/w/api.php"

    PARAMS = {
            "action": "query",
            "format": "json",
            "list": "allpages",
            "aplimit": "max",
        }
   
    R = S.get(url=URL, params=PARAMS)
    DATA = R.json()
    PAGES = DATA["query"]["allpages"]
    for page in PAGES:
        print(page['title'])

But this only prints me the first 500 titles. How can i get the rest of the titles?

Upvotes: 0

Views: 241

Answers (1)

scr
scr

Reputation: 962

I used your request and found the following:

>>> DATA["continue"]
{'apcontinue': "'Ndranghetista", 'continue': '-||'}

And as per All pages Documentation:

apcontinue: When more results are available, use this to continue.

So to keep going do:

full_data=[]
full_data.extend(DATA["query"]["allpages"])

while DATA["batchcomplete"] == "":
  PARAMS.update(DATA["continue"])
  R = S.get(url=URL, params=PARAMS)
  DATA = R.json()

I'm not sure about the stopping condition on key "batchcomplete". Please double check as I didn't find an explanation on the wikipedia API page.

Upvotes: 1

Related Questions