tmhs
tmhs

Reputation: 1070

how to efficiently do feature engineering using loop in python?

I am trying to do the following:

df['SR1'] = df['Open'].pct_change(1)
df['SR2'] = df['Open'].pct_change(2)
df['SR3'] = df['Open'].pct_change(3)
df['SR4'] = df['Open'].pct_change(4)
df['SR5'] = df['Open'].pct_change(5)

df['SR6'] = df['Open'].pct_change(6)
df['SR7'] = df['Open'].pct_change(7)
df['SR8'] = df['Open'].pct_change(8)
df['SR9'] = df['Open'].pct_change(9)
df['SR10'] = df['Open'].pct_change(10)

df['SR11'] = df['Open'].pct_change(11)
df['SR12'] = df['Open'].pct_change(12)
df['SR13'] = df['Open'].pct_change(13)
df['SR14'] = df['Open'].pct_change(14)
df['SR15'] = df['Open'].pct_change(15)

df['SR16'] = df['Open'].pct_change(16)
df['SR17'] = df['Open'].pct_change(17)
df['SR18'] = df['Open'].pct_change(18)
df['SR19'] = df['Open'].pct_change(19)
df['SR20'] = df['Open'].pct_change(20)

df['SR30'] = df['Open'].pct_change(30)
df['SR50'] = df['Open'].pct_change(50)
df['SR70'] = df['Open'].pct_change(70)
df['SR90'] = df['Open'].pct_change(90)

df['SR110'] = df['Open'].pct_change(110)
df['SR130'] = df['Open'].pct_change(130)
df['SR150'] = df['Open'].pct_change(150)
df['SR170'] = df['Open'].pct_change(170)
df['SR190'] = df['Open'].pct_change(190)

df['SR210'] = df['Open'].pct_change(210)
df['SR230'] = df['Open'].pct_change(230)
df['SR250'] = df['Open'].pct_change(250)

It looks dumb and inefficient. Is there any cool way to create a function to loop this over? I just can't get my head around to put the numbers within the bracket of the pct_change().

Upvotes: 1

Views: 161

Answers (3)

Alexander
Alexander

Reputation: 109546

If you want to be efficient, don't use a loop. You can use assign together with a dictionary comprehension.

df = df.assign(**{f'SR{n}': df['Open'].pct_change(n)
                  for n in list(range(1, 21)) + list(range(30, 270, 20))})

Or not using f-strings:

df = df.assign(**{'SR{n}'.format(n): df['Open'].pct_change(n)
                  for n in list(range(1, 21)) + list(range(30, 270, 20))})

Timings

Marginally faster using the dictionary comprehension.

df = pd.DataFrame({'Open': range(252 * 5)})

%%timeit
df.assign(**{f'SR{n}': df['Open'].pct_change(n)
             for n in list(range(1, 21)) + list(range(30, 270, 20))})
# 25.3 ms ± 2.45 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
for n in list(range(1, 21)) + list(range(30, 270, 20)):
    df[f'SR{n}'] = df['Open'].pct_change(n)
# 28.3 ms ± 3.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Upvotes: 1

gmds
gmds

Reputation: 19885

Why not a simple for loop?

for n in list(range(1, 20)) + list(range(30, 270, 20)):
    df[f'SR{n}'] = df['Open'].pct_change(n)

Note: f-string notation only works in Python >= 3.6, and is equivalent to 'SR{}'.format(n).

Upvotes: 3

SpghttCd
SpghttCd

Reputation: 10860

Perhaps

for n in numbers:
    df['SR'+str(n)] = df['Open'].pct_change(n)

with numbers containing all the indices you'd like to process.

Upvotes: 1

Related Questions