Stef85be
Stef85be

Reputation: 25

Reading in hourly data with pd.read_csv()

I'm reading in a csv file with pd.read_csv()

The data contains a financial OHLC time series, it's hourly data. The date and the hour are values seperated with a , .

Now I have two series, one for date and one for hour. Is there a way to use the two values in one series so I can convert the whole thing to a datetime object.

The read method should skip the first comma and use the two first (date and hour) values to store in a series that I will use as index.

df = pd.read_csv (csvpath, names=['Date', 'Hour', 'Open_Price','High_Price','Low_Price','Close_Price'], usecols=[0,1,2,3,4,5]) df['Date'] = pd.to_datetime(df.Date)

I tried converting the hour data to a datetime object but now it's assigning dates to it that don't match the actual data.

Thank you,

Upvotes: 0

Views: 521

Answers (2)

Stef85be
Stef85be

Reputation: 25

Solved it by creating a new csv and then reading it in with pandas.

def csv_edit(csv_in, csv_out):
# Reads in first two columns (Dates and hours)
Date, Hour = np.loadtxt (csv_in, dtype = 'str', delimiter = ',', unpack = True, usecols=[0,1])
# Reads in ohlc data
df = pd.read_csv(csv_in, delimiter = ',', names=['Open_Price','High_Price','Low_Price','Close_Price'], usecols = [2,3,4,5])
# Combines dates and hours to one series
df['Date'] = [x + '.' + y for x, y in zip(Date, Hour)]
# Sets index to Date
df.set_index('Date', inplace=True)
# Writes to new csv
df.to_csv(csv_out)

Upvotes: 0

Ben Siver
Ben Siver

Reputation: 2948

This is likely an issue with the format of the dates being parsed from your CSV.

to_datetime accepts a format parameter, which is a string you should create based on the format of the dates you are reading from the CSV. Note that when you leave this parameter off, to_datetime will make a best effort to guess the format of the dates you're parsing. Here's a reference for how you can create the format string: http://strftime.org/

As a simple example: if your dates look like 02-12-19, you could use: pd.to_datetime(df.Date, format='%m-%d-%y')

Upvotes: 1

Related Questions