Reputation: 333
I am trying to combine 2 separate data series using one minute data to create a ratio then creating Open High Low Close (OHLC) files for the ratio for the entire day. I am bringing in two time series then creating associated dataframes using pandas. The time series have missing data so I am creating a datetime variable in each file then merging the files using the pd.merge approach on the datetime variable. Up this this point everything is going fine.
Next I group the data by the date using groupby. I then feed the grouped data to a for loop that calculates the OHLC and feeds that into a new dataframe for each respective day. However, the newly populated dataframe uses the date (from the grouping) as the dataframe index and the sorting is off. The index data looks like this (even when sorted):
01/29/2013 01/29/2014 01/29/2015 12/2/2013 12/2/2014
In short, the sorting is being done on only the month not the whole date as a date so it isn't chronological. My goal is to get it sorted by date so it would be chronological. Perhaps I need to create a new column in the dataframe referencing the index (not sure how). Or maybe there is a way to tell pandas the index is a date not just a value? I tried using various sort approaches including sort_index but since the dates are the index and don't seem to be treated as dates the sort functions sort by the month regardless of the year and thus my output file is out of order. In more general terms I am not sure how to reference/manipulate the actual unique identifier index in a pandas dataframe so any associated material would be useful.
Thank you
Upvotes: 1
Views: 2352
Reputation: 333
Years later...
This fixes the problem.
import pandas as pd
df.index = pd.to_datetime(df.index) #convert the index to a datetime object
df = df.sort_index() #sort the converted
This should get the sorting back into chronological order
Upvotes: 2