Reputation: 221
I am parsing dates from a Dataframe where the data comes from a CSV file. I get the said error. I am sure the format is correct.
My Code:
import pandas as pd
from datetime import datetime
import csv
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
headers = ['Sensor Value','Date','Time']
df = pd.read_csv('C:/Users\Lala Rushan\Downloads\DataLog.CSV',names=headers)
print (df)
df['Date'] = df['Date'].map(lambda x: datetime.strptime(str(x), '%Y/%m/%d %H:%M:%S.%f'))
x = df['Date']
y = df['Sensor Value']
# plot
plt.plot(x,y)
# beautify the x-labels
plt.gcf().autofmt_xdate()
plt.show()
My DataFrame:
0 Sensor Value Date Time
1 2 2017/02/17 19:06:17.188
2 72 2017/02/17 19:06:22.360
3 72 2017/02/17 19:06:27.348
Console Error:
new_values = map_f(values, arg)
File "pandas\src\inference.pyx", line 1207, in pandas.lib.map_infer (pandas\lib.c:66124)
File "C:/Users/Lala Rushan/PycharmProjects/newgraph/newgraph.py", line 10, in <lambda>
df['Date'] = df['Date'].map(lambda x: datetime.strptime(str(x), '%Y/%m/%d %H:%M:%S.%f'))
File "C:\Users\Lala Rushan\AppData\Local\Programs\Python\Python35\lib\_strptime.py", line 500, in _strptime_datetime
tt, fraction = _strptime(data_string, format)
File "C:\Users\Lala Rushan\AppData\Local\Programs\Python\Python35\lib\_strptime.py", line 337, in _strptime
(data_string, format))
ValueError: time data 'Date' does not match format '%Y/%m/%d %H:%M:%S.%f'
CSV Input:
Sensor Value Date Time
2 2017/02/17 19:06:17.188
72 2017/02/17 19:06:22.360
72 2017/02/17 19:06:27.348
72 2017/02/17 19:06:32.482
74 2017/02/17 19:06:37.515
70 2017/02/17 19:06:42.580
Upvotes: 1
Views: 9923
Reputation: 2119
I had a similar problem while parsing dates in a large csv file. In my case, I had a few bad lines in the csv that were triggering the error, so I simply removed them from the dataframe, to later parse the dates.
If you don't mind to lose this information, you can do something like this:
df = df[df['Date'].str.contains(r'^\d{4}-\d{2}-\d{2} \d{2}\:\d{2}:\d{2}.\d{3}')]
timer = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S.%f')
df['Date'] = df['Date'].apply(timer)
Upvotes: 0
Reputation: 15941
Remove the names=headers
part as this is confusing pandas
. pandas
assumes that the first row is a header rowby default. By specifying the header names, it assumes that the first row must be data and so your error is that the word Date does not match your format (which it doesn't)
EDIT: just change the read_csv
line to this:
df = pd.read_csv('C:/Users\Lala Rushan\Downloads\DataLog.CSV')
Consequently, you don't have to specify headers
anymore so you can remove that line.
EDIT2:
The problem is that your date and time fields are deperate. Create a new column, called DateTime, which combines the two and then apply your striptime
function to that.
Replace the line I told you to delete, the df['Date'] = df['Date'].map
one with this
df['DateTime'] = df['Date'] + " " + df['Time']
df['DateTime'] = df['DateTime'].map(lambda x: datetime.strptime(str(x), '%Y/%m/%d %H:%M:%S.%f'))
Upvotes: 0
Reputation: 862601
I think you need to_datetime
with parameter errors='coerce'
for parse problematic values to NaN
:
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
For check problematic rows:
print (df[pd.to_datetime(df['Date'], errors='coerce').isnull()])
But if need read columns date
and time to datetime
use parameter parse_dates
in read_csv
:
import pandas as pd
from pandas.compat import StringIO
temp=u"""
2,2017/02/17,19:06:17.188
72,2017/02/17,19:06:22.360
72,2017/02/17,19:06:27.348
72,2017/02/17,19:06:32.482
74,2017/02/17,19:06:37.515
70,2017/02/17,19:06:42.580"""
#after testing replace 'StringIO(temp)' to 'C:/Users\Lala Rushan\Downloads\DataLog.CSV'
headers = ['Sensor Value','Date','Time']
df = pd.read_csv(StringIO(temp), names=headers, parse_dates={'Datetime':['Date','Time']})
print (df)
Datetime Sensor Value
0 2017-02-17 19:06:17.188 2
1 2017-02-17 19:06:22.360 72
2 2017-02-17 19:06:27.348 72
3 2017-02-17 19:06:32.482 72
4 2017-02-17 19:06:37.515 74
5 2017-02-17 19:06:42.580 70
print (df.dtypes)
Datetime datetime64[ns]
Sensor Value int64
dtype: object
First solution combined with second, where in last row is 30.2.2017
what does not exist:
temp=u"""
2,2017/02/17,19:06:17.188
72,2017/02/17,19:06:22.360
72,2017/02/17,19:06:27.348
72,2017/02/17,19:06:32.482
74,2017/02/17,19:06:37.515
70,2017/02/30,19:06:42.580"""
#after testing replace 'StringIO(temp)' to 'C:/Users\Lala Rushan\Downloads\DataLog.CSV'
headers = ['Sensor Value','Date','Time']
df = pd.read_csv(StringIO(temp), names=headers, parse_dates={'Datetime':['Date','Time']})
print (df)
Datetime Sensor Value
0 2017/02/17 19:06:17.188 2
1 2017/02/17 19:06:22.360 72
2 2017/02/17 19:06:27.348 72
3 2017/02/17 19:06:32.482 72
4 2017/02/17 19:06:37.515 74
5 2017/02/30 19:06:42.580 70
df['Datetime'] = pd.to_datetime(df['Datetime'], errors='coerce')
print (df)
Datetime Sensor Value
0 2017-02-17 19:06:17.188 2
1 2017-02-17 19:06:22.360 72
2 2017-02-17 19:06:27.348 72
3 2017-02-17 19:06:32.482 72
4 2017-02-17 19:06:37.515 74
5 NaT 70 <- replace 30.2.2017 to NaT (same as NaN for dates)
print (df.dtypes)
Datetime datetime64[ns]
Sensor Value int64
dtype: object
Upvotes: 3