marathonman4202
marathonman4202

Reputation: 37

python pandas time series select day of year

I want to select data from a dataframe for a particular day of the year. Here is what I have so far as a minimal example.

import pandas as pd
from datetime import datetime 
from datetime import timedelta
import numpy.random as npr 
rng = pd.date_range('1/1/1990', periods=365*10, freq='D')
df1 = pd.DataFrame(npr.randn(len(rng)), index=rng)
print df1

That generates:

                   0
1990-01-01 -0.032601
1990-01-02 -0.496401
1990-01-03  0.444490

etc. Now I make a list of dates that I want to extract. I have used this before in pandas, but I suspect this is not the best way to get values for a particular date. Anyway,

td = timedelta(days=31)
dr = pd.date_range(datetime(1990,12,31)+td,datetime(2000,12,31),
                    freq=pd.DateOffset(months=12, days=0))
print dr

This, of course, generates:

DatetimeIndex(['1991-01-31', '1992-01-31', '1993-01-31', '1994-01-31',
               '1995-01-31', '1996-01-31', '1997-01-31', '1998-01-31',
               '1999-01-31', '2000-01-31'],
dtype='datetime64[ns]', freq='<DateOffset: kwds={'months': 12, 'days': 0}>', tz=None)

When I try to slice the dataframe by the list of dates, I generate an error:

monthly_df1 = df1[dr]

Output:

KeyError: "['1991-01-30T16:00:00.000000000-0800' '1992-01-30T16:00:00.000000000-0800'\n 
'1993-01-30T16:00:00.000000000-0800' '1994-01-30T16:00:00.000000000-0800'\n 
'1995-01-30T16:00:00.000000000-0800' '1996-01-30T16:00:00.000000000-0800'\n 
'1997-01-30T16:00:00.000000000-0800' '1998-01-30T16:00:00.000000000-0800'\n 
'1999-01-30T16:00:00.000000000-0800' '2000-01-30T16:00:00.000000000-0800'] 
not in index"

I think that I have two fundamental problems here: (1) there is a better way to extract yearly data for a particular date; and (2) the time series in the dataframe and date_range list are different. I would appreciate information on both problems. Thanks, community.

Upvotes: 1

Views: 2564

Answers (2)

OYRM
OYRM

Reputation: 1415

The easiest method that you can use is to get the year's data with partial string indexing, then resample annually

df1['1990-12-31':].resample('A', how='first')

The string slicing is described in the pandas documentation under 17.4.1 DatetimeIndex Partial String Indexing. With this method, you can cut out the creation of the timedelta, the second date_range, and the complex and erroneous slicing. The resample method is standard, using 'A' as a signifier for "annual" frequency and how='first' to just grab the first matching item.

            0
1990-12-31 -0.600904
1991-12-31 -1.083462
1992-12-31  0.469949
1993-12-31 -0.809852
1994-12-31 -0.165877
1995-12-31  1.460035
1996-12-31 -0.332960
1997-12-31 -0.140873
1998-12-31  1.088687
1999-12-31  0.190218

Check out the pandas documentation, look into 17.4.1 DatetimeIndex Partial String Indexing Here is the TimeSeries documentation mentioning resampling And finally, the API doc for the resample() method

Upvotes: 1

Zero
Zero

Reputation: 76917

You could use .ix to filter dr dates from df1

In [107]: df1.ix[dr]
Out[107]:
                   0
1991-01-31 -1.239096
1992-01-31  0.153730
1993-01-31 -0.685778
1994-01-31  0.132170
1995-01-31  0.154965
1996-01-31  1.800437
1997-01-31  2.725209
1998-01-31 -0.084751
1999-01-31  1.604511
2000-01-31       NaN

Even df1.loc[dr] works.


Also, for this case, you can just pass these conditions to extract the dates

In [108]: df1[(df1.index.month==1) & (df1.index.day==31)]
Out[108]:
                   0
1990-01-31 -0.362652
1991-01-31 -1.239096
1992-01-31  0.153730
1993-01-31 -0.685778
1994-01-31  0.132170
1995-01-31  0.154965
1996-01-31  1.800437
1997-01-31  2.725209
1998-01-31 -0.084751
1999-01-31  1.604511

Upvotes: 3

Related Questions