Reputation: 33
So I have worked through the Hello Analytics tutorial to confirm that OAuth2 is working as expected for me, but I'm not having any luck with the pandas.io.ga module. In particular, I am stuck with this error:
In [1]: from pandas.io import ga
In [2]: df = ga.read_ga("pageviews", "pagePath", "2014-07-08")
/usr/local/lib/python2.7/dist-packages/pandas/core/index.py:1162: FutureWarning: using '-' to provide set differences
with Indexes is deprecated, use .difference()
"use .difference()",FutureWarning)
/usr/local/lib/python2.7/dist-packages/pandas/core/index.py:1147: FutureWarning: using '+' to provide set union with
Indexes is deprecated, use '|' or .union()
"use '|' or .union()",FutureWarning)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-2-b5343faf9ae6> in <module>()
----> 1 df = ga.read_ga("pageviews", "pagePath", "2014-07-08")
/usr/local/lib/python2.7/dist-packages/pandas/io/ga.pyc in read_ga(metrics, dimensions, start_date, **kwargs)
105 reader = GAnalytics(**reader_kwds)
106 return reader.get_data(metrics=metrics, start_date=start_date,
--> 107 dimensions=dimensions, **kwargs)
108
109
/usr/local/lib/python2.7/dist-packages/pandas/io/ga.pyc in get_data(self, metrics, start_date, end_date, dimensions,
segment, filters, start_index, max_results, index_col, parse_dates, keep_date_col, date_parser, na_values, converters,
sort, dayfirst, account_name, account_id, property_name, property_id, profile_name, profile_id, chunksize)
293
294 if chunksize is None:
--> 295 return _read(start_index, max_results)
296
297 def iterator():
/usr/local/lib/python2.7/dist-packages/pandas/io/ga.pyc in _read(start, result_size)
287 dayfirst=dayfirst,
288 na_values=na_values,
--> 289 converters=converters, sort=sort)
290 except HttpError as inst:
291 raise ValueError('Google API error %s: %s' % (inst.resp.status,
/usr/local/lib/python2.7/dist-packages/pandas/io/ga.pyc in _parse_data(self, rows, col_info, index_col, parse_dates,
keep_date_col, date_parser, dayfirst, na_values, converters, sort)
313 keep_date_col=keep_date_col,
314 converters=converters,
--> 315 header=None, names=col_names))
316
317 if isinstance(sort, bool) and sort:
/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.pyc in _read(filepath_or_buffer, kwds)
237
238 # Create the parser.
--> 239 parser = TextFileReader(filepath_or_buffer, **kwds)
240
241 if (nrows is not None) and (chunksize is not None):
/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.pyc in __init__(self, f, engine, **kwds)
551 self.options['has_index_names'] = kwds['has_index_names']
552
--> 553 self._make_engine(self.engine)
554
555 def _get_options_with_defaults(self, engine):
/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.pyc in _make_engine(self, engine)
694 elif engine == 'python-fwf':
695 klass = FixedWidthFieldParser
--> 696 self._engine = klass(self.f, **self.options)
697
698 def _failover_to_python(self):
/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.pyc in __init__(self, f, **kwds)
1412 if not self._has_complex_date_col:
1413 (index_names,
-> 1414 self.orig_names, self.columns) = self._get_index_name(self.columns)
1415 self._name_processed = True
1416 if self.index_names is None:
/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.pyc in _get_index_name(self, columns)
1886 # Case 2
1887 (index_name, columns_,
-> 1888 self.index_col) = _clean_index_names(columns, self.index_col)
1889
1890 return index_name, orig_names, columns
/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.pyc in _clean_index_names(columns, index_col)
2171 break
2172 else:
-> 2173 name = cp_cols[c]
2174 columns.remove(name)
2175 index_names.append(name)
TypeError: list indices must be integers, not Index
OAuth2 is working as expected and I have only used these parameters as demo variables--the query itself is junk. Basically, I cannot figure out where the error is coming from, and would appreciate any pointers that one may have.
Thanks!
SOLUTION (SORT OF)
Not sure if this has to do with the data I'm trying to access or what, but the offending Index type error I'm getting arises from the the index_col variable in pandas.io.ga.GDataReader.get_data() is of type pandas.core.index.Index. This is fed to pandas.io.parsers._read() in _parse_data() which falls over. I don't understand this, but it is the breaking point for me.
As a fix--if anyone else is having this problem--I have edited line 270 of ga.py to:
index_col = _clean_index(list(dimensions), parse_dates).tolist()
and everything is now smooth as butter, but I suspect this may break things in other situations...
Upvotes: 1
Views: 644
Reputation: 7070
Unfortunately, this module isn't really documented and the errors aren't always meaningful. Include your account_name
, property_name
and profile_name
(profile_name
is the View
in the online version). Then include some dimensions
and metrics
you are interested in. Also make sure that the client_secrets.json
is in the pandas.io
directory. An example:
ga.read_ga(account_name=account_name,
property_name=property_name,
profile_name=profile_name,
dimensions=['date', 'hour', 'minute'],
metrics=['pageviews'],
start_date=start_date,
end_date=end_date,
index_col=0,
parse_dates={'datetime': ['date', 'hour', 'minute']},
date_parser=lambda x: datetime.strptime(x, '%Y%m%d %H %M'),
max_results=max_results)
Also have a look at my recent step by step blog post about GA with pandas.
Upvotes: 1