David
David

Reputation: 583

Google Analytics core reporting API, fetch and dump

I'm trying to write a google analytics connector in a lambda function using python to fetch and store all the metrics and dimensions values that the Google Core Reporting API provides. As of now, I'm able to query the individual metrics/dimensions values from the api but unsure how to dump all the data as json as it only returns values which I'm asking for.

"""Hello Analytics Reporting API V4."""

import argparse

from apiclient.discovery import build
import httplib2
from oauth2client import client
from oauth2client import file
from oauth2client import tools

SCOPES = ['https://www.googleapis.com/auth/analytics.readonly']
CLIENT_SECRETS_PATH = 'client_secrets.json' # Path to client_secrets.json file.
VIEW_ID = 'xxxxxxx'


def initialize_analyticsreporting():
  """Initializes the analyticsreporting service object.

  Returns:
    analytics an authorized analyticsreporting service object.
  """
  # Parse command-line arguments.
  parser = argparse.ArgumentParser(
      formatter_class=argparse.RawDescriptionHelpFormatter,
      parents=[tools.argparser])
  flags = parser.parse_args([])

  # Set up a Flow object to be used if we need to authenticate.
  flow = client.flow_from_clientsecrets(
      CLIENT_SECRETS_PATH, scope=SCOPES,
      message=tools.message_if_missing(CLIENT_SECRETS_PATH))

  # Prepare credentials, and authorize HTTP object with them.
  # If the credentials don't exist or are invalid run through the native client
  # flow. The Storage object will ensure that if successful the good
  # credentials will get written back to a file.
  storage = file.Storage('analyticsreporting.dat')
  credentials = storage.get()
  if credentials is None or credentials.invalid:
    credentials = tools.run_flow(flow, storage, flags)
  http = credentials.authorize(http=httplib2.Http())

  # Build the service object.
  analytics = build('analyticsreporting', 'v4', http=http)

  return analytics

def get_report(analytics):
  # Use the Analytics Service Object to query the Analytics Reporting API V4.
  return analytics.reports().batchGet(
      body={
        "reportRequests": [
        {
          "viewId": VIEW_ID,
          "metrics": []

        }]
      }
  ).execute()


def print_response(response):
  """Parses and prints the Analytics Reporting API V4 response"""

  for report in response.get('reports', []):
    columnHeader = report.get('columnHeader', {})
    dimensionHeaders = columnHeader.get('dimensions', [])
    metricHeaders = columnHeader.get('metricHeader', {}).get('metricHeaderEntries', [])
    rows = report.get('data', {}).get('rows', [])

    for row in rows:
      dimensions = row.get('dimensions', [])
      dateRangeValues = row.get('metrics', [])

      for header, dimension in zip(dimensionHeaders, dimensions):
        print (header + ': ' + dimension)

      for i, values in enumerate(dateRangeValues):
        print ('Date range (' + str(i) + ')')
        for metricHeader, value in zip(metricHeaders, values.get('values')):
          print (metricHeader.get('name') + ': ' + value)


def main():

  analytics = initialize_analyticsreporting()
  response = get_report(analytics)
  print_response(response)

if __name__ == '__main__':
  main()

Existing code snippet for fetching data and the current output it produces

Date range (0)
ga:visits: 6

Instead of this, I'm trying to get all the 500+ metrics that Google Analytics provides.

Upvotes: 0

Views: 1071

Answers (1)

Max
Max

Reputation: 13334

As of now, I'm able to query the individual metrics/dimensions values from the api but unsure how to dump all the data as json as it only returns values which I'm asking for.

Yes that's how the API works: you need to query for specific dimensions and metrics and you only get what you asked for.

I'm trying to get all the 500+ metrics that Google Analytics provides.

Out of the box you can't: GA API limits you to querying 7 dimensions + 10 metrics at a time (see below v3 documentation, same applies to v4):

https://developers.google.com/analytics/devguides/reporting/core/v3/reference#largeDataResults
"allowing a maximum of 7 dimensions and 10 metrics in any one API request"

The workaround is to use a custom dimension as identifier such as User ID + session ID through which you can identify uniquely each session, and thus run multiple API queries to gather more dimensions/metrics, and then re-aggregate the data based on that custom dimension.

Here is a library that explains in more details:
https://github.com/aiqui/ga-download

Upvotes: 2

Related Questions