Reputation: 585
I have a massive text file, a dummy version looks like this after skipping headers:
1444455 7 8 12 52 45 68 70
1356799 3 3 45 34 23 22 11
I would like to read this into a numpy array and np.loadtxt is working really slow. The name of the file is data.txt. Right now I am using:
u=pd.read_csv('data.txt',dtype=np.float16,header=3).values
I have played with the parameters to no avail. If I leave out the dtype I get a single long string of numbers for each row in my array. When I insert the dtype I get the error: invalid literal for float(). I believe there is also some confusion about the two types of separators I have in the text file (tabs and single spaces). How can I get this into a numpy array of shape (2,8).
Could any of you pros help? Thanks
Upvotes: 1
Views: 1221
Reputation: 862406
It seems you need delim_whitespace=True
in read_csv
if separator is whitespace and header=None
:
Then cast to float
:
u=pd.read_csv('data.txt', delim_whitespace=True, header=None).astype(float).values
print (u)
[[ 1.44445500e+06 7.00000000e+00 8.00000000e+00 1.20000000e+01
5.20000000e+01 4.50000000e+01 6.80000000e+01 7.00000000e+01]
[ 1.35679900e+06 3.00000000e+00 3.00000000e+00 4.50000000e+01
3.40000000e+01 2.30000000e+01 2.20000000e+01 1.10000000e+01]]
but there is numpy.float64
:
u=pd.read_csv('data.txt', delim_whitespace=True, header=None).astype(float)
print (type(u.loc[0,0]))
<class 'numpy.float64'>
If use dtype=np.float16
get inf
:
u=pd.read_csv('data.txt', dtype=np.float16, delim_whitespace=True, header=None).values
print (u)
[[ inf 7. 8. 12. 52. 45. 68. 70.]
[ inf 3. 3. 45. 34. 23. 22. 11.]]
Upvotes: 2