Krishna Kumar
Krishna Kumar

Reputation: 73

How to optimize below code to run faster, size of my dataframe is almost 100,000 data points

def encoder(expiry_dt,expiry1,expiry2,expiry3):
    if expiry_dt == expiry1:
        return 1
    if expiry_dt == expiry2:
        return 2
    if expiry_dt == expiry3:
        return 3



FINAL['Expiry_encodings'] = FINAL.apply(lambda row: '{0}_{1}_{2}_{3}_{4}'.format(row['SYMBOL'],row['INSTRUMENT'],row['STRIKE_PR'],row['OPTION_TYP'], encoder(row['EXPIRY_DT'],
                                                                                                                                             row['Expiry1'],
                                                                                                                                             row['Expiry2'],
                                                                                                                                             row['Expiry3'])), axis =1)

The code runs totally fine but its too slow, is there any other alternative to achieve this in less time bound?

Sample Dataframe

Upvotes: 0

Views: 106

Answers (2)

9769953
9769953

Reputation: 12221

Give the following a try:

FINAL['expiry_number'] = '0'
for c in '321':
    FINAL.loc[FINAL['EXPIRY_DT'] == FINAL['Expiry'+c], 'expiry_number'] = c

FINAL['Expiry_encodings'] = FINAL['SYMBOL'].astype(str) + '_' + \
    FINAL['INSTRUMENT'].astype(str) + '_' + FINAL['STRIKE_PR'].astype(str) + \
    '_' + FINAL['OPTION_TYP'].astype(str) + '_' + FINAL['expiry_number']

This avoids the three if statements, has a default value ('0') if none of the if statements evaluates to True, and avoids all the string formatting; above that, it also avoids the apply method with a lambda.

Note on the '321' order: this reflects the order in which the if-chain in the original code section is evaluated: 'Expiry3' has the lowest priority, and in my code given here, it is first overridden by #2 and then by #1. The original if-chain would shortcut at #1, given that the highest priority. For example, if 'Expiry1' and 'Expiry3' have the same value (equal to 'EXPIRY_DT'), the assigned value is 1, not 3.

Upvotes: 3

Krishna Kumar
Krishna Kumar

Reputation: 73

Solution as same as above with slight change,

FINAL['expiry_number'] = '0'
    for c in '321':
        FINAL.loc[FINAL['EXPIRY_DT'] == FINAL['Expiry'+c], 'expiry_number'] = c

    FINAL['Expiry_encodings'] = FINAL['SYMBOL'].astype(str) + '_' + \
    FINAL['INSTRUMENT'].astype(str) + '_' + FINAL['STRIKE_PR'].astype(str) + \
    '_' + FINAL['OPTION_TYP'].astype(str) +' _' + FINAL['expiry_number']

Upvotes: 0

Related Questions