Reputation: 23
I'm trying to send google analytics data to a pandas dataframe using the google analytics api. I've followed along the code examples that are available in the official documentation and I now have code that manages to print out the data that I need. I need help figuring out how to send the data to a pandas dataframe instead of just printing it out.
Once I execute the query, this is the raw output that i get:
{'kind': 'analytics#gaData', 'id': 'https://www.googleapis.com/analytics/v3/data/ga?ids=ga:XXXXXXX&dimensions=ga:date&metrics=ga:sessions,ga:transactions&start-date=7daysAgo&end-date=today', 'query': {'start-date': '7daysAgo', 'end-date': 'today', 'ids': 'ga:XXXXXXX', 'dimensions': 'ga:date', 'metrics': ['ga:sessions', 'ga:transactions'], 'start-index': 1, 'max-results': 1000}, 'itemsPerPage': 1000, 'totalResults': 8, 'selfLink': 'https://www.googleapis.com/analytics/v3/data/ga?ids=ga:XXXXXXX&dimensions=ga:date&metrics=ga:sessions,ga:transactions&start-date=7daysAgo&end-date=today', 'profileInfo': {'profileId': 'XXXXXXX', 'accountId': 'XXXXXXX', 'webPropertyId': 'XXXXXXX', 'internalWebPropertyId': 'XXXXXXX', 'profileName': 'XXXXXXX', 'tableId': 'ga:XXXXXXX'}, 'containsSampledData': False, 'columnHeaders': [{'name': 'ga:date', 'columnType': 'DIMENSION', 'dataType': 'STRING'}, {'name': 'ga:sessions', 'columnType': 'METRIC', 'dataType': 'INTEGER'}, {'name': 'ga:transactions', 'columnType': 'METRIC', 'dataType': 'INTEGER'}], 'totalsForAllResults': {'ga:sessions': '86913', 'ga:transactions': '312'}, 'rows': [['20200114', '11965', '41'], ['20200115', '11052', '51'], ['20200116', '11396', '38'], ['20200117', '11097', '28'], ['20200118', '10490', '46'], ['20200119', '9829', '34'], ['20200120', '12280', '36'], ['20200121', '8804', '38']]}
The google documentation uses this function to output this data in a print statement:
def print_results(results):
# Print header.
output = []
for header in results.get('columnHeaders'):
output.append('%30s' % header.get('name'))
print(''.join(output))
# Print data table.
if results.get('rows', []):
for row in results.get('rows'):
output = []
for cell in row:
output.append('%30s' % cell)
print(''.join(output))
else:
print('No Rows Found')
As you can see, we need to capture results[columnHeaders][name]
as the column headers and we need to capture results[rows]
as the data that needs to fed into a pandas dataframe.
How can I create a function to put this data in a dataframe?
Upvotes: 2
Views: 1738
Reputation: 2710
Try the below code:
import pandas as pd
results = {'kind': 'analytics#gaData', 'id': 'https://www.googleapis.com/analytics/v3/data/ga?ids=ga:XXXXXXX&dimensions=ga:date&metrics=ga:sessions,ga:transactions&start-date=7daysAgo&end-date=today', 'query': {'start-date': '7daysAgo', 'end-date': 'today', 'ids': 'ga:XXXXXXX', 'dimensions': 'ga:date', 'metrics': ['ga:sessions', 'ga:transactions'], 'start-index': 1, 'max-results': 1000}, 'itemsPerPage': 1000, 'totalResults': 8, 'selfLink': 'https://www.googleapis.com/analytics/v3/data/ga?ids=ga:XXXXXXX&dimensions=ga:date&metrics=ga:sessions,ga:transactions&start-date=7daysAgo&end-date=today', 'profileInfo': {'profileId': 'XXXXXXX', 'accountId': 'XXXXXXX', 'webPropertyId': 'XXXXXXX', 'internalWebPropertyId': 'XXXXXXX', 'profileName': 'XXXXXXX', 'tableId': 'ga:XXXXXXX'}, 'containsSampledData': False, 'columnHeaders': [{'name': 'ga:date', 'columnType': 'DIMENSION', 'dataType': 'STRING'}, {'name': 'ga:sessions', 'columnType': 'METRIC', 'dataType': 'INTEGER'}, {'name': 'ga:transactions', 'columnType': 'METRIC', 'dataType': 'INTEGER'}], 'totalsForAllResults': {'ga:sessions': '86913', 'ga:transactions': '312'}, 'rows': [['20200114', '11965', '41'], ['20200115', '11052', '51'], ['20200116', '11396', '38'], ['20200117', '11097', '28'], ['20200118', '10490', '46'], ['20200119', '9829', '34'], ['20200120', '12280', '36'], ['20200121', '8804', '38']]}
def print_results(results):
column_names = []
for header in results.get('columnHeaders'):
column_names.append(header.get('name'))
data = results.get('rows')
create_dataframe(data, column_names)
def create_dataframe(data, column_names):
df = pd.DataFrame(data, columns = column_names)
#prints the dataframe
print(df)
print_results(results)
#output
ga:date ga:sessions ga:transactions
0 20200114 11965 41
1 20200115 11052 51
2 20200116 11396 38
3 20200117 11097 28
4 20200118 10490 46
5 20200119 9829 34
6 20200120 12280 36
7 20200121 8804 38
Upvotes: 1