baozi
baozi

Reputation: 709

how to get last week's data of all past years from a python dataframe?

I had the following stock daily price data from 2010/12 to 2017/12. How could i select last week's data of each year? I intend to check the performance for the last week of each year.

2017-01-05   52.99  13018070.0   52.370   53.0600   51.4000
2017-01-04   52.86  12556860.0   50.770   53.3400   50.7300
2017-01-03   50.29  15794400.0   48.800   50.3000   48.4700
2016-12-30   46.75  13593420.0   48.365   48.4000   46.3600
2016-12-29   47.77  11728250.0   48.440   48.8600   47.1800
2016-12-28   48.51  14636340.0   50.580   50.7300   48.4700
2016-12-27   50.43   5594876.0   49.690   50.5500   49.6500
2016-12-23   49.59   6966559.0   49.250   49.7200   48.9900
2016-12-22   49.44  10918300.0   50.320   50.5500   49.1711
2016-12-21   50.34   9279635.0   49.820   50.4400   49.6700
2016-12-20   49.53   9533020.0   48.990   49.7900   48.9100
2016-12-19   48.55  10323930.0   47.450   48.6700   47.4300
...
2010-12-20 ...

Upvotes: 1

Views: 1673

Answers (2)

Anton vBR
Anton vBR

Reputation: 18906

You could use groupby passing the datetime year. But first we would need to remove (filter away) data that doesn't meet your criteria. Also make sure your date is datetime.

This code will check that month is equal to December (12) and that the day is larger or equal of 25 (i.e. the last 7 days of each year). If you want the last week of the year you could look at Wen's lambda function.

data = '''\
2017-12-25   52.99  13018070.0   52.370   53.0600   51.4000
2017-01-04   52.86  12556860.0   50.770   53.3400   50.7300
2017-01-03   50.29  15794400.0   48.800   50.3000   48.4700
2016-12-30   46.75  13593420.0   48.365   48.4000   46.3600
2016-12-29   47.77  11728250.0   48.440   48.8600   47.1800
2016-12-28   48.51  14636340.0   50.580   50.7300   48.4700
2016-12-27   50.43   5594876.0   49.690   50.5500   49.6500
2016-12-23   49.59   6966559.0   49.250   49.7200   48.9900
2016-12-22   49.44  10918300.0   50.320   50.5500   49.1711
2016-12-21   50.34   9279635.0   49.820   50.4400   49.6700
2016-12-20   49.53   9533020.0   48.990   49.7900   48.9100
2016-12-19   48.55  10323930.0   47.450   48.6700   47.4300'''

import io
import pandas as pd

df = pd.read_csv(io.StringIO(data), sep='\s+', header=None, parse_dates=[0])
df = df[df[0].dt.month.eq(12) & df[0].dt.day.le(25)] # remove data

# Groupby year according to: https://stackoverflow.com/a/11397052/7386332
for idx, dfx in df.groupby(df[0].map(lambda x: x.year)):
    print('Dataframe containing {}\'s last week:'.format(idx))
    print(dfx)
    print()

Prints

Dataframe containing 2016's last week:
            0      1           2      3      4        5
7  2016-12-23  49.59   6966559.0  49.25  49.72  48.9900
8  2016-12-22  49.44  10918300.0  50.32  50.55  49.1711
9  2016-12-21  50.34   9279635.0  49.82  50.44  49.6700
10 2016-12-20  49.53   9533020.0  48.99  49.79  48.9100
11 2016-12-19  48.55  10323930.0  47.45  48.67  47.4300

Dataframe containing 2017's last week:
           0      1           2      3      4     5
0 2017-12-25  52.99  13018070.0  52.37  53.06  51.4

Upvotes: 2

BENY
BENY

Reputation: 323226

The data from Anton :-)

df[df.groupby(df[0].dt.year)[0].apply(lambda x : x.dt.week==x.dt.week.max())]
Out[1471]: 
           0      1           2       3      4      5
0 2017-12-25  52.99  13018070.0  52.370  53.06  51.40
3 2016-12-30  46.75  13593420.0  48.365  48.40  46.36
4 2016-12-29  47.77  11728250.0  48.440  48.86  47.18
5 2016-12-28  48.51  14636340.0  50.580  50.73  48.47
6 2016-12-27  50.43   5594876.0  49.690  50.55  49.65

Upvotes: 2

Related Questions