Reputation: 11
i keep getting this error any appreciated
i been trying this but value error shows up https://colab.research.google.com/drive/1jEmsG9WWRpUmuU92URD0PxtzWkpETlY3?usp=sharing
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
import datetime
df = pd.read_csv('/content/covid_19_india.csv')
df['split'] = np.random.randn(df.shape[0], 1)
msk = np.random.rand(len(df)) <= 0.7
training = df[msk]
test = df[~msk]
xtrain = training.drop('Sno', axis=1)
ytrain = training.loc[:, 'Sno']
xtest = test.drop('Sno', axis=1)
ytest = test.loc[:, 'Sno']
model = GaussianNB()
model.fit(xtrain, ytrain)
pred = model.predict(xtest)
mat = confusion_matrix(pred, ytest)
names = np.unique(pred)
sns.heatmap(mat, square=True, annot=True, fmt='d', cbar=False,
xticklabels=names, yticklabels=names)
plt.xlabel('Truth')
plt.ylabel('Predicted')
ValueError Traceback (most recent call last) in () 31 32 # Train the model ---> 33 model.fit(xtrain, ytrain) 34 35 # Predict Output
6 frames /usr/local/lib/python3.6/dist-packages/numpy/core/_asarray.py in asarray(a, dtype, order) 83 84 """ ---> 85 return array(a, dtype, copy=False, order=order) 86 87
ValueError: could not convert string to float: '30/01/20'
any help
Upvotes: 0
Views: 3099
Reputation: 893
As has been mentioned in both the comments and the other answer to this question, you have a column of dates formatted as strings in your dataset. You have a couple options here.
For the sake of argument, let's say your dates are in a column named df['dates']
. You can simply drop the date column if you do not want to use it.
df.drop('date', axis=1)
Another option is to convert this column to datetime format. This can be done using apply()
and datetime.datetime.strptime
. If you're an aspiring data scientist you should read and then bookmark https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior. It's pretty handy, I promise.
from datetime import datetime
df['date'] = df['date'].apply(lambda d:datetime.strptime(d, '%m/%d/%y')
Upvotes: 0
Reputation: 30991
The error message says that "30/01/20" string could not be converted to float.
So it seems that your DataFrame contains a column with dates.
Note that when read_csv reads source data, it attempts to convert numeric columns to either int or float, but other columns (which can not be converted this way) are left as strings and these columns have object data type.
Start from identification which column contains dates. Then, to convert this column to datetime, as early as at the reading phase, pass parse_dates parameter to read_csv, with a list of column names to be converted.
Then at least there should be no problem with conversion to float.
Upvotes: 0