Reputation: 745
I have csv files with multiple headers:
Year', 'Month', 'Day', 'Hour', 'Minute' --other headers--
I would like to downsize these with datetime. Right now I'm doing:
date = datetime(year=int(d[0]), month=int(d[1]), day=int(d[2]]), hour=int(d[3]]), minute=int(d[4]]))
But how do I actually delete those columns and put the date column before all other columns? I eventually want the "date" to be the index in a pandas dataframe.
UPDATE: The first five rows of one of my CSV's is:
"Year","Month","Day","Hour","Minute","Direct","Diffuse","D_Global","D_IR","U_Global","U_IR","Zenith"
2015,4,1,0,1,17.59,13.27,-999.00,158.94,10.75,178.97,88.700
2015,4,1,0,2,20.23,12.67,-999.00,159.45,10.68,178.99,88.726
2015,4,1,0,3,20.90,12.42,-999.00,159.77,10.55,179.03,88.752
2015,4,1,0,4,16.38,11.93,-999.00,160.68,10.47,179.45,88.777
I would like to have
"Date","Direct","Diffuse","D_Global","D_IR","U_Global","U_IR","Zenith"
DATETIMEOBJECT,17.59,13.27,-999.00,158.94,10.75,178.97,88.700
DATETIMEOBJECT,20.23,12.67,-999.00,159.45,10.68,178.99,88.726
DATETIMEOBJECT,20.90,12.42,-999.00,159.77,10.55,179.03,88.752
DATETIMEOBJECT,16.38,11.93,-999.00,160.68,10.47,179.45,88.777
Upvotes: 1
Views: 803
Reputation: 214967
You can specify the parse_dates
parameter as the indices of the date time related columns and write a customized date_parser
function to convert the multiple columns as a datetime column. Specify the index_col
if you want it to be the index:
import pandas as pd
from io import StringIO
pd.read_csv(StringIO("""
"Year","Month","Day","Hour","Minute","Direct","Diffuse","D_Global","D_IR","U_Global","U_IR","Zenith"
2015,4,1,0,1,17.59,13.27,-999.00,158.94,10.75,178.97,88.700
2015,4,1,0,2,20.23,12.67,-999.00,159.45,10.68,178.99,88.726
2015,4,1,0,3,20.90,12.42,-999.00,159.77,10.55,179.03,88.752
2015,4,1,0,4,16.38,11.93,-999.00,160.68,10.47,179.45,88.777"""),
sep = ",",
parse_dates={'Date': [0,1,2,3,4]},
date_parser = lambda x: pd.to_datetime(x, format="%Y %m %d %H %M"),
index_col = ['Date'])
Update:
The way date_parser
function works according to pd.read_csv()
docs:
1) Pass one or more arrays (as defined by parse_dates) as arguments; 2) concatenate (row-wise) the string values from the columns defined by parse_dates into a single array and pass that; and 3) call date_parser once for each row using one or more strings (corresponding to the columns defined by parse_dates) as arguments.
There are multiple ways this parameter can be used depending on how your customized date parser function is written, in the case given above, for instance, the second option is used, for every row, columns 0-4 are concatenated as strings separated by a whitespace and passed to pd.to_datetime()
function and so a format of %Y %m %d %H %M
works here.
Upvotes: 2