Increasing iteration speed

Question

Good afternoon,

I'm iterating through a huge Dataframe (104062 x 20) with the following code:

import pandas as pd

df_tot = pd.read_csv("C:\Users\XXXXX\Desktop\XXXXXXX\LOGS\DF_TOT.txt", header=None)

df_tot = df_tot.replace("$$", "", regex=True)
df_tot = df_tot.replace("$$", "", regex=True)
df_tot = df_tot.replace("\'", "", regex=True)

i = 0

while i < len(df_tot):
    to_compare = df_tot.iloc[i].tolist()

    for j in range(len(df_tot)):
        if to_compare == df_tot.iloc[j].tolist():

            if i == j:
                print('Matched itself.')
            else:
                print('MATCH FOUND - row: {} --- match row: {}'.format(i,j))

    i += 1

I am looking to optimize time spent for each iteration as much as possible, since this code iterates 104062(^2) times. (More or less ten billions iterations).

With my computing power, time spent comparing to_compare in the whole DF is around 26 seconds.

I want to clarify that in case it would be needed, the whole code could be changed with faster constructs.

As usual, Thanks in advance.

Increasing iteration speed

Answers (1)

Related Questions