Reputation: 3504
I have a bunch of data that I am reading from a CSV file like this into Pandas with a
df = pd.read_csv('C:\\User\\desktop\\master.csv', parse_dates=[['Date', 'Time']])
Date Time kW
3/1/2011 12:15:00 AM 171.36
3/1/2011 12:30:00 AM 181.44
3/1/2011 12:45:00 AM 175.68
3/1/2011 1:00:00 AM 180.00
3/1/2011 1:15:00 AM 175.68
doing a df.head()
print:
Date_Time kW
0 2011-03-01 00:15:00 171.36
1 2011-03-01 00:30:00 181.44
2 2011-03-01 00:45:00 175.68
3 2011-03-01 01:00:00 180.00
And my machine learning experiment I am attempting to add in some additional columns based on the time stamp day of week, hour, minute.
df['month'] = df.Date_Time.dt.month
df['Day_of_week'] = df.Date_Time.dt.dayofweek
df['hour'] = df.Date_Time.dt.hour
df['minute'] = df.Date_Time.dt.minute
Without really knowing what I am doing, my code for sci kit learn is below where I am attempting to follow this SO post with the same Type Error.
columns = df.columns.tolist()
columns = [c for c in columns if c not in ['kW', 'date']]
from sklearn import tree
clf = tree.DecisionTreeClassifier(max_depth=2, min_samples_leaf = (len(df)/100) )
clf = clf.fit(df[columns],df['kW'])
Generates the same error as the SO post above but the solution isn't fixing my issue:
float() argument must be a string or a number, not 'Timestamp'
EDIT
If I print a df.dtypes
:
Date_Time datetime64[ns]
kW float64
month int64
Day_of_week int64
hour int64
minute int64
dtype: object
If I print a df.columns
:
Index(['Date_Time', 'kW', 'month', 'Day_of_week', 'hour', 'minute'], dtype='object')
Upvotes: 1
Views: 9863
Reputation: 470
I guess you need to replace this line
columns = [c for c in columns if c not in ['kW', 'date']]
with this
columns = [c for c in columns if c not in ['kW', 'Date_Time']]
Your code should look like this:
columns = df.columns.tolist()
columns = [c for c in columns if c not in ['kW', 'Date_Time']]
from sklearn import tree
clf = tree.DecisionTreeClassifier(max_depth=2, min_samples_leaf = (len(df)/100) )
clf = clf.fit(df[columns],df['kW'])
We cannot use string columns in sciKit-learn. SciKit-learn accept only numerical data types.
You can check your coluns dtypes using df.dtypes
attribute.
If some of your columns are 'object' or 'datetime', add them to ['kW', 'Date_Time'] list.
Upvotes: 1