Improve the speed of for loop over a loaded file

Question

I have a dataset in text file in the following form:

5851F42D00000000,1
4BB5F64640B18CCF,2
742D2F7A0AE16FD9,1
76035E090D1F0796,1
6FA72CA540F7702C,3
.
.
.

The file contains 500K rows. My goal is to read the file and convert the hex values to binary. The following code works fine but it is very slow. Is there a trick to make it faster?

import pandas as pd
import numpy as np

df = pd.read_csv(path+ 'dataset.txt', sep=",", header=None)
X = []
y = []
for i, row in df.iterrows():
    n = int('{:064b}'.format(int(row.values[0], 16)))
    X.append(n)
    y.append(row.values[1])
X = np.asarray(X)
y = np.asarray(y)

RomanPerekhrest · Accepted Answer

No need of redundant loop and appending to lists.
Use pandas "magic":

df = pd.read_csv('test.csv', sep=",", header=None)
x = df[0].apply(lambda x: int('{:064b}'.format(int(x, 16)))).to_numpy()
y = df[1].to_numpy()
print(x, y)

Improve the speed of for loop over a loaded file

Answers (1)

Related Questions