Pandas - Optimizing dataframe multiple conditions reverse nested loop

Question

So I have this piece of code I wrote that works well. However, it's way too slow, as I run this code multiple tens of thousands of time frequently. I would like help to optimize it with vectorized operations, but I'm having trouble finding ways to do so, as I'm not an absolute expert in pandas yet.

def slowFunctionToOptimize():
    # Variables definition
    minVolume = 2000
    exchange1 = 'binance'
    exchange2 = 'bitmart'
    volEx1Str = 'volume_' + exchange1
    volEx2Str = 'volume_' + exchange2
    threshold = 15.0
    minDuration = 10.0

    # See below for an example dataset
    dataset = pd.read_csv('example.csv', sep='|')
    
    indicesLst = dataset.index.values
    minIndexLst = indicesLst[0]
    
    # Get all indices that exceed or are equal to the specified threshold,
    # and normalize with the first index value to work with "iloc" later on
    indicesThresh = dataset.index[dataset.diffprice >= threshold].values - minIndexLst
    
    pv = None
    prevEndIndex = len(dataset)
    
    # Get the largest possible amount of rows (sequential order) based on the volume mean
    #  of the two exchanges, where the first value exceed or is equal to the threshold
    for startInd in indicesThresh:
        for endInd in range(prevEndIndex, 0, -1):
            if endInd - startInd < minDuration:
                break
    
            dfTmp = dataset.iloc[startInd:endInd, :]
            avgVolume1 = dfTmp[volEx1Str].mean()
            avgVolume2 = dfTmp[volEx2Str].mean()
            if avgVolume1 > minVolume and avgVolume2 > minVolume:
                # Get the final result.
                pv = dfTmp.copy()
                break
    
        # Largest amount of rows found, exiting
        if pv is not None:
            break
    
        prevEndIndex = startInd
    
    if pv is None:
        print('No combination could be found for this iteration.')
        return
    
    return pv

Here's the "example.csv" dataset:

(index)	volume_binance	volume_bitmart	diffprice
17039	69971.80688	4327.175492	47.48390871854886
17040	153370.133088	3528.694959	48.53994165659899
17041	76187.307674	2392.877003	47.29860035648541
17042	104401.51896	3502.658194	47.531388100324854
17043	61927.974127999994	2955.0013229999995	48.09375550725489
17044	66800.204432	2997.791962	48.53118830843571
17045	67560.20299500001	1552.265414	49.42801256594155
17046	134373.92478899998	3411.825602	50.17573122058736
17047	57268.227960000004	408.52263999999997	48.42945038122994
17048	81061.533276	2841.843586	49.37624085103861
17049	71169.214896	2681.685852	49.881164587046925
17050	52260.95781099999	2159.977762	49.72388813015853
17051	29749.781696	2769.8278600000003	50.014876524843785
17052	117725.19930000001	3218.053416	50.38916887841825
17053	102968.103323	394.730464	51.604463023668465
17054	49060.105344	0.0	51.27689583770968
17055	195587.92349000002	0.0	52.2316844839041
17056	198279.83096000002	362.99232800000004	53.64218226150563
17057	114223.564344	1913.151606	55.15065710504449
17058	130622.16151599998	0.0	55.1193699237017
17059	74873.30932900001	0.0	53.25227963525835
17060	69200.632056	0.0	53.53227771010961
17061	53494.92473600001	0.0	54.232051635006584
17062	56407.084068000004	0.0	52.72771332505377
17063	69346.222104	0.0	52.55673222390317
17064	47778.53641	0.0	52.23429951690821
17065	78987.33048	0.0	51.71210206415115
17066	49213.878500000006	0.0	51.116439382586535
17067	77530.21365599999	188.51300000000003	50.44907946170143

Here's the expected output (return variable "pv" inside the function):

(index)	volume_binance	volume_bitmart	diffprice
17039	69971.80688	4327.175492	47.48390871854886
17040	153370.133088	3528.694959	48.53994165659899
17041	76187.307674	2392.877003	47.29860035648541
17042	104401.51896	3502.658194	47.53138810032485
17043	61927.974128	2955.0013229999995	48.09375550725489
17044	66800.204432	2997.791962	48.53118830843571
17045	67560.20299500001	1552.265414	49.42801256594155
17046	134373.92478899998	3411.825602	50.17573122058736
17047	57268.22796	408.52264	48.42945038122994
17048	81061.533276	2841.843586	49.37624085103861
17049	71169.214896	2681.685852	49.881164587046925
17050	52260.95781099999	2159.977762	49.72388813015853
17051	29749.781696	2769.8278600000003	50.014876524843785
17052	117725.1993	3218.053416	50.38916887841825
17053	102968.103323	394.730464	51.60446302366847
17054	49060.105344	0.0	51.27689583770968
17055	195587.92349	0.0	52.2316844839041
17056	198279.83096	362.992328	53.64218226150563
17057	114223.564344	1913.151606	55.15065710504449
17058	130622.16151599998	0.0	55.1193699237017

volume1	cumulative volume1	volume 2	cumulative volume2
5	5	3	3
4	9	2	5
4	13	4	9

Pandas - Optimizing dataframe multiple conditions reverse nested loop

Answers (1)

Related Questions