adrCoder
adrCoder

Reputation: 3275

Python pandas dataframe - daily data - get first and last day for every year

I have a Python pandas dataframe with daily data that looks like:

            Date       Open       High        Low      Close  Adj Close   Volume
0     2009-12-31  31.709999  31.840000  31.299999  31.309999  23.547892  1957700
1     2010-01-04  31.480000  31.840000  31.330000  31.469999  23.668222  3472500
2     2010-01-05  31.549999  31.770000  31.400000  31.639999  23.796082  3458700
3     2010-01-06  31.600000  31.889999  31.430000  31.559999  23.735907  3745800
4     2010-01-07  31.549999  31.700001  31.049999  31.230000  23.487726  7232100

How can I save the first and last entry for every year? If the last entry for a year is let's say October 31st, I would like to keep this one as the last entry for the year.

Upvotes: 3

Views: 1473

Answers (2)

Sash Sinha
Sash Sinha

Reputation: 22360

  1. Sort by Date if the rows are not already in sorted order i.e., something like:

    df = df.sort_values(by='Date')

  2. Group by year and only retain the first and last elements of each group:

    df.groupby(pd.DatetimeIndex(df.Date).to_period('Y')).nth([0,-1])

Output using example data frame from question:

            Date       Open       High        Low      Close  Adj Close  Volume
Date                                                                            
2009  2009-12-31  31.709999  31.840000  31.299999  31.309999  23.547892  1957700
2010  2010-01-04  31.480000  31.840000  31.330000  31.469999  23.668222  3472500
2010  2010-01-07  31.549999  31.700001  31.049999  31.230000  23.487726  7232100

Note: If there is only one entry per year like there is in the example, (2009), that row will only be in the output once, not twice, but if used on real data that would not be an issue anyway.

Upvotes: 5

SirAchesis
SirAchesis

Reputation: 345

Assuming you're using pandas, which IMO is the correct choice here, and this is a pandas DataFrame.

I feel like the best option would the to first index the dataframe by the date. Then you could sort the dataframe by index. This makes it so you can easily go through the rows and pick the first and last entry for each month.

You could also use pandas.DataFrame.loc to search for the correct entries. If you know exactly what dates you're looking for, then this might be the best approach.

You could approach this by making a list with all the dates you want data for. Then loop through the list and find all the corresponding entries and store them in a separate dataframe.

I recommend looking through the official documentation for pandas and especially pandas.Dataframe

Upvotes: 1

Related Questions