Reputation: 25
I'm reading in a csv file with pd.read_csv()
The data contains a financial OHLC time series, it's hourly data. The date and the hour are values seperated with a , .
Now I have two series, one for date and one for hour. Is there a way to use the two values in one series so I can convert the whole thing to a datetime object.
The read method should skip the first comma and use the two first (date and hour) values to store in a series that I will use as index.
df = pd.read_csv (csvpath, names=['Date', 'Hour', 'Open_Price','High_Price','Low_Price','Close_Price'], usecols=[0,1,2,3,4,5])
df['Date'] = pd.to_datetime(df.Date)
I tried converting the hour data to a datetime object but now it's assigning dates to it that don't match the actual data.
Thank you,
Upvotes: 0
Views: 521
Reputation: 25
Solved it by creating a new csv and then reading it in with pandas.
def csv_edit(csv_in, csv_out):
# Reads in first two columns (Dates and hours)
Date, Hour = np.loadtxt (csv_in, dtype = 'str', delimiter = ',', unpack = True, usecols=[0,1])
# Reads in ohlc data
df = pd.read_csv(csv_in, delimiter = ',', names=['Open_Price','High_Price','Low_Price','Close_Price'], usecols = [2,3,4,5])
# Combines dates and hours to one series
df['Date'] = [x + '.' + y for x, y in zip(Date, Hour)]
# Sets index to Date
df.set_index('Date', inplace=True)
# Writes to new csv
df.to_csv(csv_out)
Upvotes: 0
Reputation: 2948
This is likely an issue with the format of the dates being parsed from your CSV.
to_datetime
accepts a format
parameter, which is a string you should create based on the format of the dates you are reading from the CSV. Note that when you leave this parameter off, to_datetime
will make a best effort to guess the format of the dates you're parsing. Here's a reference for how you can create the format string: http://strftime.org/
As a simple example: if your dates look like 02-12-19
, you could use: pd.to_datetime(df.Date, format='%m-%d-%y')
Upvotes: 1