Reputation: 4318
I have a big text file which is an output of some codes and it contains lists of numbers. The format of the lists in the file is as following
[ 11.42102518 3.3538624 231.82137052 352.12777653] [ 12.68274035 2.84982539 292.99135783 331.74058558] [ 11.34657161 3.38423623 265.82486527 335.52547905] [ 12.74354078 3.57487634 241.38692542 322.61793392] [ 12.34540891 7.43226428 241.87210696 364.56618065] [ 12.11139764 4.19664745 239.1656334 321.70798174] [ 12.78659285 5.29236544 232.36062356 315.21861344] [ 12.69345477 3.21991939 285.64027138 356.25664941] [ 12.50813292 3.81440083 277.67523696 334.8178125 ] [ 13.1380115 2.84102649 270.39461828 357.04828265] [ 14.07759576 2.32715376 287.91432844 326.39725223] [ 11.85596781 4.0823778 290.16288598 353.67141937] [ 15.40525653 2.91725879 261.31334931 362.72949817] [ 15.01504576 2.46403931 275.26133082 333.77638185] [ 15.28245578 2.98091548 247.72494962 311.64421065] [ 13.49572046 2.52735399 265.58225678 332.79688739] [ 12.82575874 3.98127768 230.90060671 312.34328907] [ 16.76159178 4.02880401 281.66098464 320.10349045]
after each 500*20 lists there is a new line \n
separation.
I would like to read them into a numpy array of Nx4
. I do not know the exact number of existence lists in the file. How can I do it?
Upvotes: 3
Views: 103
Reputation: 4318
Since the input file contains unknown numbers of lists of lists, where each list contains 20 lists with 4 elements and I would like to read all of them into an array with 4 columns, the best approach I came across so far to combine all the lists and make a big array with 4 columns is as following:
f = open('sampler_chain.dat').read()
ndim=4
import numpy as np
import re
pattern = re.compile("^[0-9]")
data = np.zeros((1,ndim),float)
i=0;dl=[]
for x in f.split():
if pattern.match(x):
m = re.match(r"(\d+)\.(\d+)", x)
x=m.group(0)
print x
if ((i/4==0) and (i%4<4)):
data[i/4,i%4]=float(x)
elif ((i/4>0) and (i%4<4)):
dl.append(float(x))
if (i%4==3) and (i/4>0):
data=np.append(data,[dl],axis=0)
dl=[]
i+=1
else:
continue
The problem is that this method is not fast enough to read a huge file of data ...
Upvotes: 0
Reputation: 83
This code will store all the numbers in one array!! im not sure if thats what you actually wants ! :)
fh = open('text.txt').read()
pattern = re.compile("^[0-9]")
_array = []
for x in fh.split():
x = x.replace(']','')
if pattern.match(x):
_array.append(float(x))
else:
continue
fh.close()
print(_array)
Upvotes: 1
Reputation: 862
First, you need remove [,] from start and end of file. Second, split in on arrays, and last step split elements and put in array buffer.
buff = []
for line in open("file.txt"):
for arr in line[1:-1].strip().split("]\t[")
row = []
for el in arr.strip().split("\t")):
row.append(float(el))
buff.append(row)
Upvotes: 0