CatOfTheCanals
CatOfTheCanals

Reputation: 69

Extracting and reshaping data depending on date in python

I'm just looking for some help in deciding a method that would be most efficient. I have one dataset with particular dates, there is no regular timestep. For each of these dates I want to create a row with values ranging from 10 days before to 3 days after the date. The data I need is in 2 columns, dates in one, values in the other.

What sprung to mind was to use a loop to compare the dates and extract the values I need. I am thinking there might be a better way, using numpy\pandas or maybe something else? I feel like my idea is a fairly convoluted way of going about things.

EDIT: So the data in would be like this.

Date        Values     
2014-02-09  38.351
2014-02-10  38.281
2014-02-11  38.146
2014-02-12  38.205
2014-02-13  38.428
2014-02-14  38.449
2014-02-15  38.540
2014-02-16  38.586
2014-02-17  38.489
2014-02-18  38.552
2014-02-19  38.580
2014-02-20  38.447
2014-02-21  38.336
2014-02-22  38.284
2014-02-23  38.183
2014-02-24  38.143
2014-02-25  38.146
2014-02-26  38.221
2014-02-27  38.182
2014-02-28  38.170 

And a sample output for one row would be in the form:

                  t-10     t-9     t-8     t-7     t-6     t-5     t-4     t-3  \
    Date                                                                         
    2014-02-19  37.728  37.753  37.652  37.549  37.474  37.407  37.344  37.278   

                   t-2    t-1       t     t+1     t+2     t+3  
    Date                                                       
    2014-02-19  37.221  37.18  37.125  37.138  37.414  37.394  

Where the values from t-10 to t+3 are extracted when t = 2014-02-19. I need to do this for several different dates.

Edit: I have these specific dates I need to use. The values t-10 to t+3 with t as each of the below dates for example. This is what lead me to consider using a loop. But it seems like a messy way of doing things.

              Date
    0   2014-11-22
    1   2014-12-28
    2   2015-01-02
    3   2015-02-04
    4   2015-02-16
    5   2015-02-28
    6   2015-03-12
    7   2015-03-24
    8   2015-04-05
    9   2015-04-15
    10  2015-04-17
    11  2015-04-20
    12  2015-11-07
    13  2015-11-10
    14  2015-11-19
    15  2015-11-22
    16  2015-11-29
    17  2015-12-01
    18  2015-12-04
    19  2015-12-11

Upvotes: 1

Views: 281

Answers (2)

Federico Garza
Federico Garza

Reputation: 157

Using exactly the table you provided, I first created the columns before calling them just for clarity.

df['Date'] = pd.to_datetime(df['Date'],format='%Y-%m-%d')

for daysDelta in range(-10,4):
        key = 't'+str(daysDelta)
        df[key] = np.nan

Wich gives me this when I look at the first five rows:

        Date    Values t-10 t-9 t-8 t-7 t-6 t-5 t-4 t-3 t-2 t-1 t0  t1  t2  t3
0   2014-09-02  38.351  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1   2014-10-02  38.281  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2   2014-11-02  38.146  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3   2014-12-02  38.205  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4   2014-02-13  38.428  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

Then you want to loop through every row and assign the correct value to each column using the index (which row to assing) and key (which column):

for index, row in df.iterrows():
    for daysDelta in range(-10,4): #loops through days
        key = 't'+str(daysDelta)

        # will be true if the difference of days is the one you are looking for
        booleanTimeDelta = ((df.loc[:,'Date'] -row['Date']).dt.days == daysDelta) 

        # if any are true find them and assign it  
        if any(booleanTimeDelta):
            df.loc[index:index+1,key] = df.loc[booleanTimeDelta,'Values'].values

This is the output for your example

         Date  Values    t-10     t-9     t-8     t-7     t-6     t-5     t-4     t-3     t-2     t-1      t0      t1      t2      t3
0  2014-09-02  38.351     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN  38.351     NaN     NaN     NaN
1  2014-10-02  38.281     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN  38.281     NaN     NaN     NaN
2  2014-11-02  38.146     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN  38.146     NaN     NaN     NaN
3  2014-12-02  38.205     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN  38.205     NaN     NaN     NaN
4  2014-02-13  38.428     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN  38.428  38.449  38.540  38.586
5  2014-02-14  38.449     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN  38.428  38.449  38.540  38.586  38.489
6  2014-02-15  38.540     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN  38.428  38.449  38.540  38.586  38.489  38.552
7  2014-02-16  38.586     NaN     NaN     NaN     NaN     NaN     NaN     NaN  38.428  38.449  38.540  38.586  38.489  38.552  38.580
8  2014-02-17  38.489     NaN     NaN     NaN     NaN     NaN     NaN  38.428  38.449  38.540  38.586  38.489  38.552  38.580  38.447
9  2014-02-18  38.552     NaN     NaN     NaN     NaN     NaN  38.428  38.449  38.540  38.586  38.489  38.552  38.580  38.447  38.336
10 2014-02-19  38.580     NaN     NaN     NaN     NaN  38.428  38.449  38.540  38.586  38.489  38.552  38.580  38.447  38.336  38.284
11 2014-02-20  38.447     NaN     NaN     NaN  38.428  38.449  38.540  38.586  38.489  38.552  38.580  38.447  38.336  38.284  38.183
12 2014-02-21  38.336     NaN     NaN  38.428  38.449  38.540  38.586  38.489  38.552  38.580  38.447  38.336  38.284  38.183  38.143
13 2014-02-22  38.284     NaN  38.428  38.449  38.540  38.586  38.489  38.552  38.580  38.447  38.336  38.284  38.183  38.143  38.146
14 2014-02-23  38.183  38.428  38.449  38.540  38.586  38.489  38.552  38.580  38.447  38.336  38.284  38.183  38.143  38.146  38.221
15 2014-02-24  38.143  38.449  38.540  38.586  38.489  38.552  38.580  38.447  38.336  38.284  38.183  38.143  38.146  38.221  38.182
16 2014-02-25  38.146  38.540  38.586  38.489  38.552  38.580  38.447  38.336  38.284  38.183  38.143  38.146  38.221  38.182  38.182
17 2014-02-26  38.221  38.586  38.489  38.552  38.580  38.447  38.336  38.284  38.183  38.143  38.146  38.221  38.182  38.182     NaN
18 2014-02-27  38.182  38.489  38.552  38.580  38.447  38.336  38.284  38.183  38.143  38.146  38.221  38.182  38.182     NaN     NaN

Upvotes: 1

Federico Garza
Federico Garza

Reputation: 157

I'm going to make many assumptions about what you want but I think I understand your problem. You have something like this.

In [1]: df
Out[1]: 
    dates   numbers sumOfDates
0   2016-02-04  1   NaN
1   2016-02-13  2   NaN
2   2016-01-25  4   NaN
3   2016-01-16  1   NaN
4   2016-01-27  3   NaN
5   2016-01-13  4   NaN
6   2016-01-15  5   NaN
7   2016-01-29  1   NaN
8   2016-01-11  3   NaN
9   2016-01-17  4   NaN

All those NaN is where

Here numbers being the values you talked about. Then this should be simple by iterating through rows locating all the files within the desired range. Then just apply to the sum and assign it to the appropiate row in the sumOfDates

for index, row in df.iterrows():
    df.loc[index,'sumOfDates'] = df[(df.dates >(row['dates']-pd.DateOffset(10)))].loc[(df.dates <
    (row['dates']+pd.DateOffset(3)))].numbers.sum()

The output should be the following:

df
dates   numbers sumOfDates
0   2016-02-04  1   5.0
1   2016-02-13  2   3.0
2   2016-01-25  4   12.0
3   2016-01-16  1   17.0
4   2016-01-27  3   8.0
5   2016-01-13  4   12.0
6   2016-01-15  5   17.0
7   2016-01-29  1   8.0
8   2016-01-11  3   7.0
9   2016-01-17  4   17.0

If I'm wrong please post some examples so we can see if this is what you're looking for. But I hope you can see what you can do with pandas.

Upvotes: 0

Related Questions