Pandas read_csv parse_dates=true missing out date column

Question

I have a CSV file (example below), that I'm trying to load into a dataframe and have pandas automatically parse the dates.

"http://www.example.com","http://example.com","test",2016-06-16,2016-02-21,4

When I load this file specifying the columns to be parsed, they are successfully loaded as datetimes:

df = pd.read_csv(inputfile, parse_dates=[3,4])

However I don't know that these dates will always be columns 3 & 4, so I wanted it to attempt to parse each column and see if it's a date, my understanding from the pandas docs, was this is accomplished by:

df = pd.read_csv(inputfile, parse_dates=True)

However this loads columns 3 & 4 as objects. Presumably I have misunderstood this. Is there a correct way to do this? Do I need to load the dataframe and then try to convert each column to a date?

(I'm running Canopy with Python 2.7.11 -- 64-bit on Windows 10)

user2285236 · Accepted Answer

parse_dates does not work like that. If you pass True, it will assume that the index is of type datetime:

parse_dates : boolean or list of ints or names or list of lists or dict, default False boolean. If True -> try parsing the index. list of ints or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column. list of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column. dict, e.g. {‘foo’ : [1, 3]} -> parse columns 1, 3 as date and call result ‘foo’ Note: A fast-path exists for iso8601-formatted dates.

For your case, you need to explicitly state which column to be parsed as dates. Otherwise, all your numerical columns could be converted to datetime as well.

Pandas read_csv parse_dates=true missing out date column

Answers (1)

Related Questions