Reputation: 1271
Google Analytics treats incremental browser versions as different, so my reports are useless for drawing any helpful conclusions. eg Chrome 45.0.2454.93 is considered a different browser than 45.0.2454.85.
I want to write a Python 2 app that grabs a Google Analytics CSV and aggregates session info for major browser versions.
I'm new to Python but here's my attempt...
from __future__ import division
import csv
from collections import defaultdict
RAWFile = 'somefile.csv'
def default_val():
return [0, 0]
def aggregateaway():
with open(RAWFile, 'r') as inf:
has_header = csv.Sniffer().has_header(inf.read(1024))
inf.seek(0) # rewind
incsv = csv.reader(inf)
if has_header:
next(incsv) # skip header row
reader = csv.DictReader(incsv, 'r')
BrowserVersion = defaultdict(default_val)
for row in reader:
Sessions = int(row["Sessions"])
BrowserVersion[row["BrowserVersion"]][0] += Sessions
writer = csv.writer(open('out.csv', 'w'))
writer.writerow(["BrowserVersion", "Sessions"])
writer.writerows([BrowserVersion] + BrowserVersion[BrowserVersion] for BrowserVersion in BrowserVersion)
I have two problems that I know about:
ValueError('I/O operation on closed file',)
- I think it's because of the logic I use to skip the leading rows before the data.left(BrowserVersion, 2)
? Even so this is flawed because of other browsers versioning conventions. Perhaps I could search for the first .
and then apply the left, x number of characters. How would I add that to the code above?Edit: Some sample CSV data:
# ----------------------------------------
# My Site
# Web Browsers
# 20150828-20150927
# ----------------------------------------
Browser,Operating System,Browser Version,Sessions,Bounce Rate
Safari,iOS,8.0,"1,681",68.91%
Chrome,Windows,45.0.2454.85,"1,200",40.98%
Chrome,Windows,45.0.2454.93,"2,273",40.98%
Upvotes: 2
Views: 180
Reputation: 1271
This is what I ended up using with a lot of help from a colleague. Hopefully Google decide to add this feature to Analytics some time soon:)
#!/usr/bin/env python
import csv
import operator
import pprint
inputfilename = 'input.csv'
outputfilename = 'output.csv'
values = []
with open(inputfilename, 'rb') as csvfile: #Open file
reader = csv.DictReader(filter(lambda row: row[0]!='#', csvfile)) #Skip rows with #
header = reader.next().values()[0] #Gives a list of field names
for rows in reader:
row = rows.values()[0]
values.append({header[i]: row[i] for i in range(len(header))}) #Creates list of csv data in a dictionary
report = {} #Define empty dictionary to aggregate data into
for value in values:
browserstring = value["Operating System"] + " - " + value["Browser"] + " - " + value["Browser Version"].split('.')[0] #Split browser version by '.' to get major version release
if value["Browser"] <> '': #Skip to next to avoid GA column totals in output (i.e. those with a blank browser value)
if browserstring in report:
report [browserstring] += int(value["Sessions"].replace(',','')) #Remove number comma formatting, sum data
else:
report [browserstring] = int(value["Sessions"].replace(',','')) #Remove number formatting and add new reecord (if it does not exist already already)
else:
next
sorted_report = sorted(report.items(), reverse=True, key=operator.itemgetter(1)) #Convert dictionary to tuple to sort values in descending order
#pprint.pprint(sorted_report) #for debugging
with open(outputfilename,'w') as out: #Let's print this to file
csv_out=csv.writer(out)
csv_out.writerow(['Aggregated Browser Version - Major']) #Title
csv_out.writerow(['Browser','Sessions']) #Column headers
for row in sorted_report: #Data from ordered tuple list
csv_out.writerow(row)
Output CSV example two rows:
Upvotes: 2