Prof. Falken
Prof. Falken

Reputation: 549

KeyError Using Google DataLab on datetime64 dtype

I am using Googles datalab to query BigQuery and use that data for an RNN. I am trying to convert the majority of the data to a numeric format but I am receiving a KeyError for the "Date" column. If I leave it as a string my code works but I need the date field to remain a datetime data type for the RNN. My code and the error are posted below.

import datalab.bigquery as bq
import seaborn as sns
import pandas as pd
import numpy as np
import tensorflow as tf
import shutil

%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')

%%sql --m log_data
SELECT resource, place_id, CAST(end_time AS TIMESTAMP) AS Date, device, device_os, device_os_version, CAST(latency AS FLOAT) AS Latency,
CAST(megacycles AS FLOAT) AS Megacycles, method, CAST(response_size AS FLOAT) AS Response_Size, CAST(cost AS FLOAT) AS Cost, status,
device_brand, device_family, browser_version, app, ua_parse
FROM [project:app_logs_data] 
WHERE DATE(CAST(end_time AS DATE)) >='2018-03-09' AND DATE(CAST(end_time AS DATE)) <='2018-03-10'
GROUP BY resource, place_id, Date, device, device_os, device_os_version, Latency, Megacycles, method,
        Response_size, Cost, status, device_brand, device_family, browser_version, app, ua_parse

logs = bq.Query(log_data).to_dataframe()
logs['Date'] = pd.to_datetime(logs['Date'])
logs['resource'].apply(str)

logs.dtypes
resource                     object
place_id                     object
Date                 datetime64[ns]
device                       object
device_os                    object
device_os_version            object
Latency                     float64
Megacycles                  float64
method                       object
Response_Size               float64
Cost                        float64
status                       object
device_brand                 object
device_family                object
browser_version              object
app                          object
ua_parse                     object
dtype: object

def handle_non_numerical_data(logs):
    columns = logs.columns.values

    for column in columns:
        text_digit_vals = {}
        def convert_to_int(val):
            return text_digit_vals[val]
        if logs[column].dtype != np.int64 and \
                logs[column].dtype != np.float64 and logs[column].dtype != np.datetime64:
            column_contents = logs[column].values.tolist()
            unique_elements = set(column_contents)
            x = 0
            for unique in unique_elements:
                if unique not in text_digit_vals:
                    text_digit_vals[unique] = x
                    x+=1
            logs[column] = list(map(convert_to_int, logs[column]))
    return logs

logs = handle_non_numerical_data(logs)


KeyErrorTraceback (most recent call last)
<ipython-input-8-27437e47a873> in <module>()
     17     return logs
     18 
---> 19 logs = handle_non_numerical_data(logs)

<ipython-input-8-27437e47a873> in handle_non_numerical_data(logs)
     14                     text_digit_vals[unique] = x
     15                     x+=1
---> 16             logs[column] = list(map(convert_to_int, logs[column]))
     17     return logs
     18 

<ipython-input-8-27437e47a873> in convert_to_int(val)
      5         text_digit_vals = {}
      6         def convert_to_int(val):
----> 7             return text_digit_vals[val]
      8         if logs[column].dtype != np.int64 and                 logs[column].dtype != np.float64 and logs[column].dtype != np.datetime64:
      9             column_contents = logs[column].values.tolist()

KeyError: Timestamp('2018-03-09 00:00:00')

Upvotes: 0

Views: 207

Answers (1)

yelsayed
yelsayed

Reputation: 5532

The datalab.* is a deprecated library, you should switch to using google.datalab.*, and the %%bq instead of %%sql. See if that has already been fixed in the new library.

Upvotes: 2

Related Questions