GFG
GFG

Reputation: 55

Python help optimize this function

data = 
        Symbol   Value  Day
0         AACG  1.8708    1
1         AACG  1.8500    2
2         AACG  1.8869    3
3         AACG  1.8200    4
4         AACG  1.8578    5
...        ...     ...  ...
3407024   ZYXI   5.25    1
3407025   ZYXI   4.96    2
3407026   ZYXI   4.99    3
3407027   ZYXI   4.99    4
3407028   ZYXI   4.95    5
...        ...    ...  ...
3407250   ZYXI  8.1500  227
3407251   ZYXI  8.2600  228
3407252   ZYXI  8.3900  229
3407253   ZYXI  8.1200  230
3407254   ZYXI  8.0700  231
import pandas as pd
import numpy as np

for index, row in data.iterrows():
    for i in range(1, 91):
        cstr = 'day-' + str(i)
        val = 'NaN'
        try:
            val = float(data[np.logical_and(data['Symbol'] == row['Symbol'],
                            data['Day'] == row['Day'] - i)].Value)
        except:
            val = 'NaN'
        data.loc[index,cstr] = val

The function loops through each row in the data frame

for each row in the data frame, it loops 90 times (i)

for each loop, it adds a column with value

value is the value in data frame with the same symbol as row but day as day from row minus i

output =
  Symbol   Value  Day   day-1   day-2   day-3   day-4... day-89 day-90
0   AACG  1.8708    1     NaN     NaN     NaN     NaN
1   AACG  1.8500    2  1.8708     NaN     NaN     NaN
2   AACG  1.8869    3  1.8500  1.8708     NaN     NaN
3   AACG  1.8200    4  1.8869  1.8500  1.8708     NaN
4   AACG  1.8578    5  1.8200  1.8869  1.8500  1.8708
5   AACG  1.8709    6  1.8578  1.8200  1.8869  1.8500
6   AACG  1.8700    7  1.8709  1.8578  1.8200  1.8869
7   AACG  1.8800    8  1.8700  1.8709  1.8578  1.8200
8   AACG  1.8000    9  1.8800  1.8700  1.8709  1.8578
9   AACG  1.7900   10  1.8000  1.8800  1.8700  1.8709

Upvotes: 2

Views: 69

Answers (2)

Joe Ferndz
Joe Ferndz

Reputation: 8508

You can do the following:

  • Step 1: Create a dictionary with 90 columns and assign np.nan to it. Each column will have Day_ + str(i) where i ranges from 1 thru 90.
  • Step 2: Create a dataframe with those 90 columns.
  • Step 3: Concatenate the dataframe to your original dataframe. Now you have the 90 additional columns with np.NaN as values.
  • Step 4: Now do shift(1) of Value after groupby(Symbol)
  • Step 5: Now iterate from 2 thru 90 and do shift(1) of Day_1. That will give you all the required values.

The code to do this is:

c = ['Symbol','Value','Day']
d = [['AACG',1.8708,1],
     ['AACG',1.8500,2],
     ['AACG',1.8869,3],
     ['AACG',1.8200,4],
     ['AACG',1.8578,5],
     ['ZYXI',5.25,1],
     ['ZYXI',4.96,2],
     ['ZYXI',4.99,3],
     ['ZYXI',4.99,4],
     ['ZYXI',4.95,5]]

import pandas as pd
import numpy as np
df = pd.DataFrame(d,columns=c)
cols = {'Day_'+str(i):np.NaN for i in range(1,91)}

df = pd.concat([df,pd.DataFrame(cols,index=df.index)], axis=1)

for i in range (1,91):
    df['Day_'+str(i)] = df.groupby(['Symbol'])['Value'].transform(lambda x:x.shift(i))

print (df)

The output of this will be:

  Symbol   Value  Day   Day_1   Day_2  ...  Day_86  Day_87  Day_88  Day_89  Day_90
0   AACG  1.8708    1     NaN     NaN  ...     NaN     NaN     NaN     NaN     NaN
1   AACG  1.8500    2  1.8708     NaN  ...     NaN     NaN     NaN     NaN     NaN
2   AACG  1.8869    3  1.8500  1.8708  ...     NaN     NaN     NaN     NaN     NaN
3   AACG  1.8200    4  1.8869  1.8500  ...     NaN     NaN     NaN     NaN     NaN
4   AACG  1.8578    5  1.8200  1.8869  ...     NaN     NaN     NaN     NaN     NaN
5   ZYXI  5.2500    1     NaN     NaN  ...     NaN     NaN     NaN     NaN     NaN
6   ZYXI  4.9600    2  5.2500     NaN  ...     NaN     NaN     NaN     NaN     NaN
7   ZYXI  4.9900    3  4.9600  5.2500  ...     NaN     NaN     NaN     NaN     NaN
8   ZYXI  4.9900    4  4.9900  4.9600  ...     NaN     NaN     NaN     NaN     NaN
9   ZYXI  4.9500    5  4.9900  4.9900  ...     NaN     NaN     NaN     NaN     NaN

I will create a dataframe with 90+ rows for AACG and show you the results so you can see that Day_90 will have the right value.

I added a few more records to show you that the 90 days column gets filled.

>>> df.iloc[80:100]
   Symbol   Value  Day   Day_1   Day_2  ...  Day_86  Day_87  Day_88  Day_89  Day_90
80   AACG  1.8659   81  1.8658  1.8657  ...     NaN     NaN     NaN     NaN     NaN
81   AACG  1.8660   82  1.8659  1.8658  ...     NaN     NaN     NaN     NaN     NaN
82   AACG  1.8661   83  1.8660  1.8659  ...     NaN     NaN     NaN     NaN     NaN
83   AACG  1.8662   84  1.8661  1.8660  ...     NaN     NaN     NaN     NaN     NaN
84   AACG  1.8663   85  1.8662  1.8661  ...     NaN     NaN     NaN     NaN     NaN
85   AACG  1.8664   86  1.8663  1.8662  ...     NaN     NaN     NaN     NaN     NaN
86   AACG  1.8665   87  1.8664  1.8663  ...  1.8708     NaN     NaN     NaN     NaN
87   AACG  1.8666   88  1.8665  1.8664  ...  1.8500  1.8708     NaN     NaN     NaN
88   AACG  1.8667   89  1.8666  1.8665  ...  1.8869  1.8500  1.8708     NaN     NaN
89   AACG  1.8668   90  1.8667  1.8666  ...  1.8200  1.8869  1.8500  1.8708     NaN
90   AACG  1.8669   91  1.8668  1.8667  ...  1.8578  1.8200  1.8869  1.8500  1.8708
91   AACG  1.8670   92  1.8669  1.8668  ...  1.8584  1.8578  1.8200  1.8869  1.8500
92   AACG  1.8671   93  1.8670  1.8669  ...  1.8585  1.8584  1.8578  1.8200  1.8869
93   AACG  1.8672   94  1.8671  1.8670  ...  1.8586  1.8585  1.8584  1.8578  1.8200
94   AACG  1.8673   95  1.8672  1.8671  ...  1.8587  1.8586  1.8585  1.8584  1.8578
95   ZYXI  5.2500    1     NaN     NaN  ...     NaN     NaN     NaN     NaN     NaN
96   ZYXI  4.9600    2  5.2500     NaN  ...     NaN     NaN     NaN     NaN     NaN
97   ZYXI  4.9900    3  4.9600  5.2500  ...     NaN     NaN     NaN     NaN     NaN
98   ZYXI  4.9900    4  4.9900  4.9600  ...     NaN     NaN     NaN     NaN     NaN
99   ZYXI  4.9500    5  4.9900  4.9900  ...     NaN     NaN     NaN     NaN     NaN

Upvotes: 1

Pygirl
Pygirl

Reputation: 13349

try using shift and pd.concat

N = 5
df_new = pd.DataFrame()
for i,grp in df.groupby('Symbol'):
    l = pd.concat([grp['Value'].shift(i).rename(f'Day_{i}') for i in range(1,N)], axis=1)
    final_df = pd.concat([grp, l], axis=1)
    df_new = df_new.append(final_df)

OR

def f(x):
    x['Day-0'] = x['Value']
    for i in range(1,N+1):
        x[f'Day-{i}'] = x[f'Day-{i-1}'].shift()
    x.drop('Day-0', inplace=True ,axis=1)
    return x

final_df = df.groupby('Symbol').apply(f)

**final_df:"

enter image description here

Upvotes: 1

Related Questions