Reputation: 3275
I have a Python pandas dataframe with daily data that looks like:
Date Open High Low Close Adj Close Volume
0 2009-12-31 31.709999 31.840000 31.299999 31.309999 23.547892 1957700
1 2010-01-04 31.480000 31.840000 31.330000 31.469999 23.668222 3472500
2 2010-01-05 31.549999 31.770000 31.400000 31.639999 23.796082 3458700
3 2010-01-06 31.600000 31.889999 31.430000 31.559999 23.735907 3745800
4 2010-01-07 31.549999 31.700001 31.049999 31.230000 23.487726 7232100
How can I save the first and last entry for every year? If the last entry for a year is let's say October 31st, I would like to keep this one as the last entry for the year.
Upvotes: 3
Views: 1473
Reputation: 22360
Sort by Date
if the rows are not already in sorted order i.e., something like:
df = df.sort_values(by='Date')
Group by year and only retain the first and last elements of each group:
df.groupby(pd.DatetimeIndex(df.Date).to_period('Y')).nth([0,-1])
Output using example data frame from question:
Date Open High Low Close Adj Close Volume
Date
2009 2009-12-31 31.709999 31.840000 31.299999 31.309999 23.547892 1957700
2010 2010-01-04 31.480000 31.840000 31.330000 31.469999 23.668222 3472500
2010 2010-01-07 31.549999 31.700001 31.049999 31.230000 23.487726 7232100
Note: If there is only one entry per year like there is in the example, (2009), that row will only be in the output once, not twice, but if used on real data that would not be an issue anyway.
Upvotes: 5
Reputation: 345
Assuming you're using pandas, which IMO is the correct choice here, and this is a pandas DataFrame.
I feel like the best option would the to first index the dataframe by the date. Then you could sort the dataframe by index. This makes it so you can easily go through the rows and pick the first and last entry for each month.
You could also use pandas.DataFrame.loc to search for the correct entries. If you know exactly what dates you're looking for, then this might be the best approach.
You could approach this by making a list with all the dates you want data for. Then loop through the list and find all the corresponding entries and store them in a separate dataframe.
I recommend looking through the official documentation for pandas and especially pandas.Dataframe
Upvotes: 1