Abhay Singh
Abhay Singh

Reputation: 23

Google Analytics Data to Pandas Dataframe

I'm trying to send google analytics data to a pandas dataframe using the google analytics api. I've followed along the code examples that are available in the official documentation and I now have code that manages to print out the data that I need. I need help figuring out how to send the data to a pandas dataframe instead of just printing it out.

Once I execute the query, this is the raw output that i get:

{'kind': 'analytics#gaData', 'id': 'https://www.googleapis.com/analytics/v3/data/ga?ids=ga:XXXXXXX&dimensions=ga:date&metrics=ga:sessions,ga:transactions&start-date=7daysAgo&end-date=today', 'query': {'start-date': '7daysAgo', 'end-date': 'today', 'ids': 'ga:XXXXXXX', 'dimensions': 'ga:date', 'metrics': ['ga:sessions', 'ga:transactions'], 'start-index': 1, 'max-results': 1000}, 'itemsPerPage': 1000, 'totalResults': 8, 'selfLink': 'https://www.googleapis.com/analytics/v3/data/ga?ids=ga:XXXXXXX&dimensions=ga:date&metrics=ga:sessions,ga:transactions&start-date=7daysAgo&end-date=today', 'profileInfo': {'profileId': 'XXXXXXX', 'accountId': 'XXXXXXX', 'webPropertyId': 'XXXXXXX', 'internalWebPropertyId': 'XXXXXXX', 'profileName': 'XXXXXXX', 'tableId': 'ga:XXXXXXX'}, 'containsSampledData': False, 'columnHeaders': [{'name': 'ga:date', 'columnType': 'DIMENSION', 'dataType': 'STRING'}, {'name': 'ga:sessions', 'columnType': 'METRIC', 'dataType': 'INTEGER'}, {'name': 'ga:transactions', 'columnType': 'METRIC', 'dataType': 'INTEGER'}], 'totalsForAllResults': {'ga:sessions': '86913', 'ga:transactions': '312'}, 'rows': [['20200114', '11965', '41'], ['20200115', '11052', '51'], ['20200116', '11396', '38'], ['20200117', '11097', '28'], ['20200118', '10490', '46'], ['20200119', '9829', '34'], ['20200120', '12280', '36'], ['20200121', '8804', '38']]}

The google documentation uses this function to output this data in a print statement:

def print_results(results):

    # Print header.
    output = []
    for header in results.get('columnHeaders'):
        output.append('%30s' % header.get('name'))
    print(''.join(output))

    # Print data table.
    if results.get('rows', []):
        for row in results.get('rows'):
            output = []
            for cell in row:
                output.append('%30s' % cell)
            print(''.join(output))
    else:
        print('No Rows Found')

As you can see, we need to capture results[columnHeaders][name] as the column headers and we need to capture results[rows] as the data that needs to fed into a pandas dataframe.

How can I create a function to put this data in a dataframe?

Upvotes: 2

Views: 1738

Answers (1)

Subhrajyoti Das
Subhrajyoti Das

Reputation: 2710

Try the below code:

import pandas as pd
results = {'kind': 'analytics#gaData', 'id': 'https://www.googleapis.com/analytics/v3/data/ga?ids=ga:XXXXXXX&dimensions=ga:date&metrics=ga:sessions,ga:transactions&start-date=7daysAgo&end-date=today', 'query': {'start-date': '7daysAgo', 'end-date': 'today', 'ids': 'ga:XXXXXXX', 'dimensions': 'ga:date', 'metrics': ['ga:sessions', 'ga:transactions'], 'start-index': 1, 'max-results': 1000}, 'itemsPerPage': 1000, 'totalResults': 8, 'selfLink': 'https://www.googleapis.com/analytics/v3/data/ga?ids=ga:XXXXXXX&dimensions=ga:date&metrics=ga:sessions,ga:transactions&start-date=7daysAgo&end-date=today', 'profileInfo': {'profileId': 'XXXXXXX', 'accountId': 'XXXXXXX', 'webPropertyId': 'XXXXXXX', 'internalWebPropertyId': 'XXXXXXX', 'profileName': 'XXXXXXX', 'tableId': 'ga:XXXXXXX'}, 'containsSampledData': False, 'columnHeaders': [{'name': 'ga:date', 'columnType': 'DIMENSION', 'dataType': 'STRING'}, {'name': 'ga:sessions', 'columnType': 'METRIC', 'dataType': 'INTEGER'}, {'name': 'ga:transactions', 'columnType': 'METRIC', 'dataType': 'INTEGER'}], 'totalsForAllResults': {'ga:sessions': '86913', 'ga:transactions': '312'}, 'rows': [['20200114', '11965', '41'], ['20200115', '11052', '51'], ['20200116', '11396', '38'], ['20200117', '11097', '28'], ['20200118', '10490', '46'], ['20200119', '9829', '34'], ['20200120', '12280', '36'], ['20200121', '8804', '38']]}

def print_results(results):
    column_names = []
    for header in results.get('columnHeaders'):
        column_names.append(header.get('name'))
    data = results.get('rows')
    create_dataframe(data, column_names)

def create_dataframe(data, column_names):
    df = pd.DataFrame(data, columns = column_names)
    #prints the dataframe
    print(df)

print_results(results)

#output
    ga:date ga:sessions ga:transactions
0  20200114       11965              41
1  20200115       11052              51
2  20200116       11396              38
3  20200117       11097              28
4  20200118       10490              46
5  20200119        9829              34
6  20200120       12280              36
7  20200121        8804              38


Upvotes: 1

Related Questions