Python replace multiple date columns with datetime

Question

I have csv files with multiple headers:

Year', 'Month', 'Day', 'Hour', 'Minute' --other headers--

I would like to downsize these with datetime. Right now I'm doing:

date = datetime(year=int(d[0]), month=int(d[1]), day=int(d[2]]), hour=int(d[3]]), minute=int(d[4]]))

But how do I actually delete those columns and put the date column before all other columns? I eventually want the "date" to be the index in a pandas dataframe.

UPDATE: The first five rows of one of my CSV's is:

"Year","Month","Day","Hour","Minute","Direct","Diffuse","D_Global","D_IR","U_Global","U_IR","Zenith"
2015,4,1,0,1,17.59,13.27,-999.00,158.94,10.75,178.97,88.700
2015,4,1,0,2,20.23,12.67,-999.00,159.45,10.68,178.99,88.726
2015,4,1,0,3,20.90,12.42,-999.00,159.77,10.55,179.03,88.752
2015,4,1,0,4,16.38,11.93,-999.00,160.68,10.47,179.45,88.777

I would like to have

"Date","Direct","Diffuse","D_Global","D_IR","U_Global","U_IR","Zenith"
DATETIMEOBJECT,17.59,13.27,-999.00,158.94,10.75,178.97,88.700
DATETIMEOBJECT,20.23,12.67,-999.00,159.45,10.68,178.99,88.726
DATETIMEOBJECT,20.90,12.42,-999.00,159.77,10.55,179.03,88.752
DATETIMEOBJECT,16.38,11.93,-999.00,160.68,10.47,179.45,88.777

akuiper · Accepted Answer

You can specify the parse_dates parameter as the indices of the date time related columns and write a customized date_parser function to convert the multiple columns as a datetime column. Specify the index_col if you want it to be the index:

import pandas as pd   
from io import StringIO   
pd.read_csv(StringIO("""
"Year","Month","Day","Hour","Minute","Direct","Diffuse","D_Global","D_IR","U_Global","U_IR","Zenith"
2015,4,1,0,1,17.59,13.27,-999.00,158.94,10.75,178.97,88.700
2015,4,1,0,2,20.23,12.67,-999.00,159.45,10.68,178.99,88.726
2015,4,1,0,3,20.90,12.42,-999.00,159.77,10.55,179.03,88.752
2015,4,1,0,4,16.38,11.93,-999.00,160.68,10.47,179.45,88.777"""), 
            sep = ",", 
            parse_dates={'Date': [0,1,2,3,4]}, 
            date_parser = lambda x: pd.to_datetime(x, format="%Y %m %d %H %M"), 
            index_col = ['Date'])

Update: The way date_parser function works according to pd.read_csv() docs:

1) Pass one or more arrays (as defined by parse_dates) as arguments; 2) concatenate (row-wise) the string values from the columns defined by parse_dates into a single array and pass that; and 3) call date_parser once for each row using one or more strings (corresponding to the columns defined by parse_dates) as arguments.

There are multiple ways this parameter can be used depending on how your customized date parser function is written, in the case given above, for instance, the second option is used, for every row, columns 0-4 are concatenated as strings separated by a whitespace and passed to pd.to_datetime() function and so a format of %Y %m %d %H %M works here.

Python replace multiple date columns with datetime

Answers (1)

Related Questions