Dave
Dave

Reputation: 564

Improve the speed of for loop over a loaded file

I have a dataset in text file in the following form:

5851F42D00000000,1
4BB5F64640B18CCF,2
742D2F7A0AE16FD9,1
76035E090D1F0796,1
6FA72CA540F7702C,3
.
.
.

The file contains 500K rows. My goal is to read the file and convert the hex values to binary. The following code works fine but it is very slow. Is there a trick to make it faster?

import pandas as pd
import numpy as np

df = pd.read_csv(path+ 'dataset.txt', sep=",", header=None)
X = []
y = []
for i, row in df.iterrows():
    n = int('{:064b}'.format(int(row.values[0], 16)))
    X.append(n)
    y.append(row.values[1])
X = np.asarray(X)
y = np.asarray(y)

Upvotes: 1

Views: 32

Answers (1)

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

No need of redundant loop and appending to lists.
Use pandas "magic":

df = pd.read_csv('test.csv', sep=",", header=None)
x = df[0].apply(lambda x: int('{:064b}'.format(int(x, 16)))).to_numpy()
y = df[1].to_numpy()
print(x, y)

Upvotes: 2

Related Questions