Jayanta Panja
Jayanta Panja

Reputation: 1

How to extract affected 2000 URLs into excel sheet in google search console?

Able to download the excel file in google search console but having only 1000 URLs listed in that sheet instead of 2000 URLs. How can I download the excel file with 2000 or more URLs listed which are been affected.

Once I am in the page indexing tab in google search console, then going with the not indexed section. Having total 11 affected reasons such as excluded by 'noindex' tag, page with redirect and so on. Once by clicking in excluded by 'no index' tag, it's stating that all together 17.7k URLs are been affected. So now by downloading the excel file in order to resolve the issue, I am getting only 1k URLs instead of 17.7k URLs.

Upvotes: 0

Views: 495

Answers (1)

Maniac
Maniac

Reputation: 237

You can use the Google Search Console API to retrieve the full list of URLs. The API allows you to retrieve up to 5,000 URLs at a time. You can try with the following Python code:

import requests
import json

# Set up the API endpoint and parameters
api_endpoint = 'https://www.googleapis.com/webmasters/v3/sites/[SITE_URL]/urlCrawlErrorsSamples'
error_category = '[ERROR_CATEGORY]' # Replace with the error category (e.g., 'notFound', 'serverError', 'soft404')
platform_type = '[PLATFORM_TYPE]' # Replace with the platform type (e.g., 'web')
latest_counts_only = False

# Set up the authentication headers
access_token = '[YOUR_ACCESS_TOKEN]' # Replace with your access token
headers = {'Authorization': f'Bearer {access_token}'}

# Send the API request
response = requests.get(api_endpoint, headers=headers, params={'category': error_category, 'platform': platform_type, 'latestCountsOnly': latest_counts_only})

# Process the API response
if response.status_code == 200:
    results = json.loads(response.content)
    urls = [url['pageUrl'] for url in results['urlCrawlErrorSample']]
    print(f'Retrieved {len(urls)} URLs:')
    print('\n'.join(urls))
else:
    print(f'Error retrieving data: {response.status_code} - {response.text}')

Upvotes: 0

Related Questions