Johannes
Johannes

Reputation: 177

Webmaster Tools Api, get more then 1000 crawling errors

Im using the new webmaster tools api to get all my site's crawling errors (+ details). Unfort. it only gives me 1000 but i have like 10000. Is there a way to get all of them ?

This is the code i use:

package main;

import com.google.api.client.googleapis.auth.oauth2.GoogleAuthorizationCodeFlow;
import com.google.api.client.googleapis.auth.oauth2.GoogleCredential;
import com.google.api.client.googleapis.auth.oauth2.GoogleTokenResponse;
import com.google.api.client.http.HttpTransport;
import com.google.api.client.http.javanet.NetHttpTransport;
import com.google.api.client.json.JsonFactory;
import com.google.api.client.json.jackson2.JacksonFactory;

import com.google.api.services.webmasters.Webmasters;
import com.google.api.services.webmasters.Webmasters.Urlcrawlerrorssamples;
import com.google.api.services.webmasters.model.SitesListResponse;
import com.google.api.services.webmasters.model.UrlCrawlErrorsSample;
import com.google.api.services.webmasters.model.UrlCrawlErrorsSamplesListResponse;
import com.google.api.services.webmasters.model.WmxSite;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.Arrays;
import java.util.ArrayList;
import java.util.Collection;
import java.util.List;


public class WebmastersCommandLine {

  private static String CLIENT_ID = "...";
  private static String CLIENT_SECRET = "...";

  private static String REDIRECT_URI = "urn:ietf:wg:oauth:2.0:oob";

  private static String OAUTH_SCOPE = "https://www.googleapis.com/auth/webmasters.readonly";

  private static String PAGE_URL = "...";

  public static void main(String[] args) throws IOException {
    HttpTransport httpTransport = new NetHttpTransport();
    JsonFactory jsonFactory = new JacksonFactory();

    GoogleAuthorizationCodeFlow flow = new GoogleAuthorizationCodeFlow.Builder(
        httpTransport, jsonFactory, CLIENT_ID, CLIENT_SECRET, Arrays.asList(OAUTH_SCOPE))
        .setAccessType("online")
        .setApprovalPrompt("auto").build();

    String url = flow.newAuthorizationUrl().setRedirectUri(REDIRECT_URI).build();
    System.out.println("open URL:");
    System.out.println("  " + url);
    System.out.println("code:");
    BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
    String code = br.readLine();

    GoogleTokenResponse response = flow.newTokenRequest(code).setRedirectUri(REDIRECT_URI).execute();
    GoogleCredential credential = new GoogleCredential().setFromTokenResponse(response);

    // Create a new authorized API client
    Webmasters service = new Webmasters.Builder(httpTransport, jsonFactory, credential)
        .setApplicationName("WebmastersCommandLine")
        .build();

    Webmasters.Urlcrawlerrorssamples.List req2 = service.urlcrawlerrorssamples().list(PAGE_URL, "notFound", "web");

    try
    {
        UrlCrawlErrorsSamplesListResponse urlList = req2.execute();

        System.out.println("start");

        for(UrlCrawlErrorsSample sample : urlList.getUrlCrawlErrorSample())
        {
            Webmasters.Urlcrawlerrorssamples.Get req3 = service.urlcrawlerrorssamples().get(PAGE_URL, sample.getPageUrl(), "notFound", "web");
            UrlCrawlErrorsSample details = req3.execute();

            System.out.println(sample.getPageUrl() + "," + details.getUrlDetails().getLinkedFromUrls());
        }

    }
    catch(IOException e)
    {
        System.out.println("An error occurred: " + e);
    }

    System.out.println("done");
  }

}

This however only gives me a list of 1000 errors, but i need all 10000 of them. Does anybody know a way to do that ?

Upvotes: 1

Views: 1169

Answers (1)

John Mueller
John Mueller

Reputation: 1481

The Webmaster Tools API URL Crawl Errors Sample method returns a sample of 1000 crawl errors. It's not meant to return a complete list (you could compile that from your server logs). If you want more samples through the API, one thing you can do is to mark these errors as fixed and check back in a day. It will then generate a set of samples from the remaining crawl errors.

The order of the samples is the same as in the UI, so the more important ones will be the first ones you see. This means that there are diminishing returns as you move on, with later crawl errors being either similar to the previous ones, or at least seen as being less critical. The original blog post has more on the prioritization:

We determine this based on a multitude of factors, including whether or not you included the URL in a Sitemap, how many places it’s linked from (and if any of those are also on your site), and whether the URL has gotten any traffic recently from search.

Upvotes: 1

Related Questions