Reputation: 578
I am trying to forecast time series data.
The time series data in my csv file is in the form 0:00.000
Hence, I indexed the time series data column as follows:
df.columns=['Elapsed','I']
df['Elapsed']=pd.to_datetime(df['Elapsed'], format='%H:%M.%S%f')
df['Elapsed']=df['Elapsed'].dt.time
df.set_index('Elapsed', inplace=True)
Then later I split my data into the test section and the train section
train = df.loc['0:00.000':'0:28.778']
test = df.loc['0:28.779':]
My stack trace is
An extract of my data is:
Can anyone explain how to prevent this error from occuring?
Upvotes: 1
Views: 2078
Reputation: 8595
Since the question has now changed, I'll write a new answer.
Your dataframe is indexed by instances of datetime.time
, but you're trying to slice it with strings - pandas doesn't want to compare strings with times.
To get your slicing to work, try this:
split_from = datetime.datetime.strptime('0:00.000', '%H:%M.%S%f').time()
split_to = datetime.datetime.strptime('0:28.778', '%H:%M.%S%f').time()
train = df[split_from:split_to]
It would also be useful to hold the format in a variable since you're now using it in several places.
Or if you have fixed split times, you could instead do
split_from = datetime.time(0, 0, 0)
split_to = datetime.time(0, 28, 77.8)
train = df[split_from:split_to]
Upvotes: 1
Reputation: 8595
Without seeing your data, I'm just guessing, but here goes:
I'm guessing your original data in the 'Elapsed' column looks like
'12:34.5678'
'12:35.1234'
In particular, it has quotes each side of the numbers. Otherwise your line
df['Elapsed']=pd.to_datetime(df['Elapsed'], format="'%H:%M.%S%f'")
would fail.
So the error message is telling you that your slicing times have the wrong format: they are missing quotes on each side. Change it to
train = df.loc["'0:00.000'":"'0:28.778'"]
(likewise for the next line) and hopefully that will sort it out.
If you can extract your source data in a way that avoids having quote characters in the timestamps, you'll probably find things a little simpler.
Upvotes: 0