John Doe
John Doe

Reputation: 55

Parsing dates in pandas

So I need to read a csv file that contains date and numerical value with a format like this: 2017-01-01 00:00:00,-350.7866656.

Here is my code:

def parser(x):                                                                                                         
    return datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
df = read_csv('filename', header=None, parse_dates=[0], index_col=0, 
squeeze=True, date_parser=parser)

print(df.head())

What I get is an error: ValueError: unconverted data remains: ,-350.7866656 As I understand program treats string as a single column instead of two and tries to parse whole string instead of first column with date. How do I fix this so it reads csv file into two columns?

Upvotes: 2

Views: 416

Answers (2)

Abhijit Ghate
Abhijit Ghate

Reputation: 382

I looked up the pandas.read_csv documentation. The parameter index_col which you have set to 0 is setting the date column in the file as the index. In my opinion, removing the parameter would yield the desired result,i.e. reading the CSV in two columns.

Cheers!

Upvotes: 0

jezrael
jezrael

Reputation: 862406

In my opinion, the problem is you may have some bad date values, which cannot be parsed by a single datetime format. Instead, add a try-except statement to a parser function:

import pandas as pd
import numpy as np
from datetime import datetime, time

temp=u"""2015-01-04 12:04:10,10
2018-01-10 18:40:12,20
-350.7866656,20"""

def parser(x): 
    try:                                                                                                        
        return datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
    except ValueError:
        return np.nan

#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp),  header=None, parse_dates=[0], index_col=0, 
squeeze=True, date_parser=parser)

print (df)
0
2015-01-04 12:04:10    10
2018-01-10 18:40:12    20
NaT                    20
Name: 1, dtype: int64

print (df.index)
DatetimeIndex(['2015-01-04 12:04:10', '2018-01-10 18:40:12', 'NaT'], dtype='datetime64[ns]', name=0, freq=None)

Upvotes: 3

Related Questions