Jamgreen
Jamgreen

Reputation: 11039

How do I crawl an API with multiple pages

I have an url for an API with some JSON:

{
  "posts": [ ... ],
  "page": { ... },
  "next": "/posts.json?page=2"
}

where /posts.json?page=2 has different page numbers and might be null if there are no more pages.

How can I, in Python, create a function that outputs all the pages with all the posts?

I guess I will have to do something like

def get_posts(url, posts=[]):
  json = request(url).json()

  posts.append(json.posts)

  while json.next_page:
    return get_posts(json.next_page, posts)

but I guess I could do something with yield?

Upvotes: 2

Views: 827

Answers (1)

saq7
saq7

Reputation: 1818

def get_posts(url, posts=None):
  # initialize the posts lists
  posts = [] if posts is None else posts

  # make the request and convert to json
  json = request(url).json()

  # extends the posts array with the returned posts
  posts.extend(json['posts'])

  # if there is a next_page, call the function recursively
  if json.next_page:
    return get_posts(json.next_page, posts)

  # if there isn't a next_page, return the posts
  return posts

Upvotes: 1

Related Questions