hangc
hangc

Reputation: 5473

Alternative Google Analytics IO with Pandas

The pandas 0.17.1 version has depreciated the pandas.io.ga module.

What are the alternatives to using google analytics with pandas now? Is there a credible library which can be used now?

Upvotes: 2

Views: 1705

Answers (2)

n1tk
n1tk

Reputation: 2500

Remote Data Access

You should replace the imports of the following:

from pandas.io import data, wb

     With:

from pandas_datareader import data, wb

Functions from pandas.io.data and pandas.io.ga extract data from various Internet sources into a DataFrame. Currently the following sources are supported:

Yahoo! Finance Google Finance St.Louis FED (FRED) Kenneth French’s data library World Bank Google Analytics

https://github.com/pydata/pandas-datareader

was discussions that GA to make into the pandas_datareader but so far is not present(not tested, here is the issue: https://github.com/pandas-dev/pandas/issues/8961 ), so far this issue has been addressed with the "googleanalytics" package.

Example:

import googleanalytics as ga
accounts = ga.authenticate()
profile = accounts[0].webproperties[0].profile
pageviews = profile.core.query.metrics('pageviews').range('yesterday').value
print(pageviews)

https://github.com/debrouwere/google-analytics

example for the pandas-datareader

working code:

import pandas_datareader.data as web
import datetime

start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2017, 11, 24)
f = web.DataReader("F", 'google', start, end)
f.loc['2017-11-24']

result with a day Sample of how look the dataframe

Hope it helps!

Upvotes: 2

Matt
Matt

Reputation: 313

The Google2Pandas module was created to get around this specific problem. Nothing fancy, simply does what it says on the box.

v3:

from google2pandas import GoogleAnalyticsQuery

query = {\
    'ids'           : <valid_ids>,
    'metrics'       : 'pageviews',
    'dimensions'    : ['date', 'pagePath', 'browser'],
    'filters'       : ['pagePath=~iPhone', 'and', 'browser=~Firefox'],
    'start_date'    : '8daysAgo',
    'max_results'   : 10}

conn = GoogleAnalyticsQuery(secrets='client_secrets_v3.json',
                            token_file_name='analytics.dat')
df, metadata = conn.execute_query(**query)

v4:

from google2pandas import GoogleAnalyticsQueryV4

query = {
    'reportRequests': [{
        'viewId' : <valid_ids>,

        'dateRanges': [{
            'startDate' : '8daysAgo',
            'endDate'   : 'today'}],

        'dimensions' : [
            {'name' : 'ga:date'}, 
            {'name' : 'ga:pagePath'},
            {'name' : 'ga:browser'}],

        'metrics'   : [
            {'expression' : 'ga:pageviews'}],

        'dimensionFilterClauses' : [{
            'operator' : 'AND',
            'filters'  : [
                {'dimensionName' : 'ga:browser',
                 'operator' : 'REGEXP',
                 'expressions' : ['Firefox']},

                {'dimensionName' : 'ga:pagePath',
                 'operator' : 'REGEXP',
                 'expressions' : ['iPhone']}]
        }]
    }]
}


conn = GoogleAnalyticsQueryV4(secrets='client_secrets_v4.json')
df = conn.execute_query(query)

Upvotes: 7

Related Questions