bbartling
bbartling

Reputation: 3504

TypeError: float() argument must be a string or a number, not 'Timestamp'

I have a bunch of data that I am reading from a CSV file like this into Pandas with a df = pd.read_csv('C:\\User\\desktop\\master.csv', parse_dates=[['Date', 'Time']])

Date     Time        kW
3/1/2011 12:15:00 AM 171.36
3/1/2011 12:30:00 AM 181.44
3/1/2011 12:45:00 AM 175.68
3/1/2011 1:00:00 AM 180.00
3/1/2011 1:15:00 AM 175.68

doing a df.head() print:

            Date_Time    kW
0 2011-03-01 00:15:00 171.36
1 2011-03-01 00:30:00 181.44
2 2011-03-01 00:45:00 175.68
3 2011-03-01 01:00:00 180.00

And my machine learning experiment I am attempting to add in some additional columns based on the time stamp day of week, hour, minute.

df['month'] = df.Date_Time.dt.month
df['Day_of_week'] = df.Date_Time.dt.dayofweek
df['hour'] = df.Date_Time.dt.hour
df['minute'] = df.Date_Time.dt.minute

Without really knowing what I am doing, my code for sci kit learn is below where I am attempting to follow this SO post with the same Type Error.

columns = df.columns.tolist()
columns = [c for c in columns if c not in ['kW', 'date']]
from sklearn import tree
clf = tree.DecisionTreeClassifier(max_depth=2, min_samples_leaf = (len(df)/100) )
clf = clf.fit(df[columns],df['kW'])

Generates the same error as the SO post above but the solution isn't fixing my issue: float() argument must be a string or a number, not 'Timestamp'

EDIT

If I print a df.dtypes:

Date_Time      datetime64[ns]
kW                    float64
month                   int64
Day_of_week             int64
hour                    int64
minute                  int64
dtype: object

If I print a df.columns: Index(['Date_Time', 'kW', 'month', 'Day_of_week', 'hour', 'minute'], dtype='object')

Upvotes: 1

Views: 9863

Answers (1)

Sergei
Sergei

Reputation: 470

I guess you need to replace this line

columns = [c for c in columns if c not in ['kW', 'date']]

with this

columns = [c for c in columns if c not in ['kW', 'Date_Time']]

Your code should look like this:

columns = df.columns.tolist()
columns = [c for c in columns if c not in ['kW', 'Date_Time']]
from sklearn import tree
clf = tree.DecisionTreeClassifier(max_depth=2, min_samples_leaf = (len(df)/100) )
clf = clf.fit(df[columns],df['kW'])

We cannot use string columns in sciKit-learn. SciKit-learn accept only numerical data types. You can check your coluns dtypes using df.dtypes attribute.

If some of your columns are 'object' or 'datetime', add them to ['kW', 'Date_Time'] list.

Upvotes: 1

Related Questions