Reputation: 3523
I have the following code, and i don't know how to print the links of the next page, how to go to the next pages?
#!/usr/bin/python2.4
# -*- coding: utf-8 -*-
import pprint
from apiclient.discovery import build
def main():
service = build("customsearch", "v1",
developerKey="")
res = service.cse().list(
q='lectures',
cx='013036536707430787589:_pqjad5hr1a',
num=10, #Valid values are integers between 1 and 10, inclusive.
).execute()
for value in res:
#print value
if 'items' in value:
for results in res[value]:
print results['formattedUrl']
if __name__ == '__main__':
main()
Upvotes: 11
Views: 7840
Reputation: 774
I made a function to get X number of image links from a given starting index. If you want all results, remove the searchType='image'
from the list call
def search_images(query, start=1, num_images=10):
api_key = "api_key"
resource = build("customsearch", 'v1', developerKey=api_key).cse()
id = "search_engine_id"
max_num_results = 10
# There is an implicit range for custom search, values must be between [1, 201]
if num_images + start > 201:
num_images = 201 - start
items = []
if num_images <= max_num_results:
results = resource.list(
q=query,
cx=id,
searchType="image",
start=start,
num=num_images
).execute()
items = results['items']
else:
for i in range(start, num_images, max_num_results):
results = resource.list(
q=query,
cx=id,
searchType="image",
start=i,
num=max_num_results
).execute()
items += results['items']
links = [x['link'] for x in items]
next_item_index = start + num_images
if next_item_index == 201:
next_item_index = "EOF"
print(next_item_index)
return links, next_item_index
Upvotes: 0
Reputation: 887
# define the pages you want to scrape
max_page = 3
def google_search(service, query_keywords, api_key, cse_id):
res = service.cse().list(q=query_keywords, cx=cse_id).execute()
return res
def google_next_page(service, query_keywords, api_key, cse_id, res, page, max_page, url_items):
next_res = service.cse().list(q=query_keywords, cx=cse_id, num=10, start=res['queries']['nextPage'][0]['startIndex'],).execute()
for item in next_res['items']:
url_items.append(item)
page += 1
if page == max_page:
return url_items
return google_next_page(service, query_keywords, api_key, cse_id, next_res, page, max_page, url_items)
Upvotes: 2
Reputation: 25609
The response object contains a 'nextPage' dictionary. You can use this to determine the start index of the next request. Like so:
res = service.cse().list(
q='lectures',
cx='013036536707430787589:_pqjad5hr1a',
num=10, #Valid values are integers between 1 and 10, inclusive.
).execute()
next_response = service.cse().list(
q='lectures',
cx='013036536707430787589:_pqjad5hr1a',
num=10,
start=res['queries']['nextPage'][0]['startIndex'],
).execute()
Upvotes: 14
Reputation: 595
My proposition is to add next parameter. In current software you have q, cx and num. You could try add start=10 and then execute the code.
res = service.cse().list(
q='lectures',
cx='013036536707430787589:_pqjad5hr1a',
num=10,
start=10,
).execute()
First result page URL doesn't have start parameter. Second page has URL which contains start=10 parameter. Third page has URL which contains start=20 ...
Good luck
Upvotes: 8