Rohit Mourya
Rohit Mourya

Reputation: 285

GausianNB: Could not convert string to float: 'Thu Apr 16 23:58:58 2015'

I'm trying to solve a machine learning problem using GaussianNB. I've certain fields which are not in proper date format, so I converted them into UNIX format. For example column state_changed_at has value in csv as 1449619185. I'm converting it into proper date format.

When I'm selecting those date features to train my model, I get this error:

Could not convert string to float: 'Thu Apr 16 23:58:58 2015'

import pandas as pd
import numpy as np
from sklearn import metrics
from sklearn.naive_bayes import BernoulliNB
from sklearn.naive_bayes import MultinomialNB
import time
from sklearn.naive_bayes import GaussianNB

train = pd.read_csv("datasets/train2.csv")
test = pd.read_csv("datasets/test.csv")
train.head()

import time

#   state_changed_at,deadline,created_at,launched_at are date time fields
# and I'm converting it into unix format
unix_cols = ['deadline','state_changed_at','launched_at','created_at']

for x in unix_cols:
    train[x] = train[x].apply(lambda k: time.ctime(k))
    test[x] = test[x].apply(lambda k: time.ctime(k))


#   state_changed_at,deadline,created_at,launched_at are date time fields.
cols_to_use = ['keywords_len' ,'keywords_count','state_changed_at','deadline','created_at','launched_at']

target = train['final_status']

# data for modeling
k_train = train[cols_to_use]
k_test = test[cols_to_use]


gnb = GaussianNB()

model = MultinomialNB()
model.fit(k_train, target)  # this lines gives me error saying: could not convert string to float: 'Thu Apr 16 23:58:58 2015'

expected = target
predicted = model.predict(k_test)
print(model.score(k_test, predicted, sample_weight=None))

Upvotes: 0

Views: 687

Answers (1)

Prune
Prune

Reputation: 77857

What is your confusion? You gave it DateTime columns. SKLearn.fit doesn't accept that type. Specifically:

Parameters: 
X : array-like, dtype=float64, size=[n_samples, n_features]
Y : array, dtype=float64, size=[n_samples]

If you want to train on times, you need to cast (or leave) them as an ingestible type, i.e. basic numeric. Can you leave them in their original, epoch-based integer form while you perform the fitting?

Upvotes: 1

Related Questions