Reputation: 564
I have a dataset in text file in the following form:
5851F42D00000000,1
4BB5F64640B18CCF,2
742D2F7A0AE16FD9,1
76035E090D1F0796,1
6FA72CA540F7702C,3
.
.
.
The file contains 500K
rows. My goal is to read the file and convert the hex values to binary. The following code works fine but it is very slow. Is there a trick to make it faster?
import pandas as pd
import numpy as np
df = pd.read_csv(path+ 'dataset.txt', sep=",", header=None)
X = []
y = []
for i, row in df.iterrows():
n = int('{:064b}'.format(int(row.values[0], 16)))
X.append(n)
y.append(row.values[1])
X = np.asarray(X)
y = np.asarray(y)
Upvotes: 1
Views: 32
Reputation: 92854
No need of redundant loop and appending to lists.
Use pandas "magic":
df = pd.read_csv('test.csv', sep=",", header=None)
x = df[0].apply(lambda x: int('{:064b}'.format(int(x, 16)))).to_numpy()
y = df[1].to_numpy()
print(x, y)
Upvotes: 2