Dominic Woodman
Dominic Woodman

Reputation: 839

Pandas read_csv parse_dates=true missing out date column

I have a CSV file (example below), that I'm trying to load into a dataframe and have pandas automatically parse the dates.

"http://www.example.com","http://example.com","test",2016-06-16,2016-02-21,4

When I load this file specifying the columns to be parsed, they are successfully loaded as datetimes:

df = pd.read_csv(inputfile, parse_dates=[3,4])

However I don't know that these dates will always be columns 3 & 4, so I wanted it to attempt to parse each column and see if it's a date, my understanding from the pandas docs, was this is accomplished by:

df = pd.read_csv(inputfile, parse_dates=True)

However this loads columns 3 & 4 as objects. Presumably I have misunderstood this. Is there a correct way to do this? Do I need to load the dataframe and then try to convert each column to a date?

(I'm running Canopy with Python 2.7.11 -- 64-bit on Windows 10)

Upvotes: 4

Views: 12036

Answers (1)

user2285236
user2285236

Reputation:

parse_dates does not work like that. If you pass True, it will assume that the index is of type datetime:

parse_dates : boolean or list of ints or names or list of lists or dict, default False boolean. If True -> try parsing the index. list of ints or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column. list of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column. dict, e.g. {‘foo’ : [1, 3]} -> parse columns 1, 3 as date and call result ‘foo’ Note: A fast-path exists for iso8601-formatted dates.

For your case, you need to explicitly state which column to be parsed as dates. Otherwise, all your numerical columns could be converted to datetime as well.

Upvotes: 6

Related Questions