Reputation: 113
I'm trying to import and array from a txt file in Python.
The file is a bunch of arrays or "list" I'm not sure of the terminology basically each array contains a varying amount of integers between 15 and 30 or so.
Any method I try only reads per line but this wont work for me as one array spans 4 lines and i need to read in each array as a whole.
The format of the data is as follows:
9 10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 89 90 91 92
93 94 95 96 8447 8448 8449 8450 8451 845
8453 8454 8488 8489 8490 164624 164625 164626 164627 164628
164629
13 14 15 16 17 18 19 20 21 22
23 24 25 26 27 28 91 92 93 94
95 96 97 98 8449 8450 8451 8452 8453 8454
8455 8456 8488 8489 8490 8491 164626 164627 164628 164629
164630 164631 164632 164633 164666 164667 164668
17 18 19 20 21 22 23 24 25 26
27 28 29 30 31 32 93 94 95 96
97 98 99 100 8451 8452 8453 8454 8455 8456
8457 8458 8489 8490 8491 8492 164628 164629 164630 164631
164632 164633 164634 164635 164666 164667 164668
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 95 96 97 98
99 100 101 102 8453 8454 8455 8456 8457 8458
8459 8460 8490 8491 8492 8493 164630 164631 164632 164633
164634 164635 164636 164667 164668 164669 164670
I am the one generating this file so i can change it in anyway that would make it simpler.
I've tried -
readlines genfromtxt loadtxt
Anything I could find any output that I can get to work outputs per line so the first entry would be:
9 10 11 12 13 14 15 16 17 18
as opposed to:
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 89 90 91 92 93 94 95 96 8447 8448 8449 8450 8451 845 8453 8454 8488 8489 8490 164624 164625 164626 164627 164628 164629
This is the code used to generate the output file:
for i in c_array:
n_array = []
for j in i:
for k in range(8):
a = []
sorted_c_array = sorted_c_arrays[k]
c_col = sorted_c_array[:,k]
b = (binarySearch(c_col,j,sorted_c_array))
if b == True:
n_array.append(np.array(a))
else:
continue
n_array = np.reshape(n_array,(1,(np.size(n_array))))
n_array = np.unique(n_array)
output.writelines(str(n_array).replace(']',']\n')) code here
Using this:
c_array = []
a = []
for l in file("C:/Users/09432191/SkyDrive/Masters/python/Finished programs/Pre- Prosessing/current_conectivity2.dat"):
line = l.strip()
if l == "\n" :
c_array.append(a)
a = []
a.append(line)
print c_array[0]
I got as far as, I can't figure out how to get rid of the unwanted characters though:
['[ 9 10 11 12 13 14 15 16 17 18', '19 20 21 22 23 24 89 90 91 92', '93 94 95 96 8447 8448 8449 8450 8451 8452', '8453 8454 8488 8489 8490 164624 164625 164626 164627 164628', '164629]']
Upvotes: 1
Views: 174
Reputation:
If you only need to access the file with Numpy, you can use np.save
and np.load
. This stores the data in a much more convenient format: there are no conversions needed from integer to string or vice versa and this is much faster than using text files. Also the code becomes really simple and straightforward:
import numpy as np
arr = np.random.randint(1, 200000, (180000, 47))
np.save('test.npy', arr) # 250 milisec on my system
loaded_arr = np.load('test.npy') # 55 milisec on my system
# alternatively using text based files:
np.savetxt('test.txt', arr) # 19 seconds
loaded_arr = np.loadtxt('test.txt', dtype=np.int) # 32 seconds
This way you don't have 180000 separate arrays, but a one big data structure, in which you can access each (sub) array by slicing. However, also when you save the data it should be one single 2D array, but it shouldn't be to difficult to adapt your code to save your data in this format (if each sub array has the same size at least).
Upvotes: 2
Reputation: 1575
A simple way to read this data back is to just read line by line and accumulate the values into an array, let's call it "row". Then when we read an empty line we just append the row to the result array and clear the current "row". If we want to handle the case when the file doesn't end with an empty line, then we have to handle that case explicitly:
res = []
row = []
for l in file('/tmp/data.txt'):
line = l.strip().split()
if not line:
res.append(row)
row = []
else:
row.extend(line)
res.append(row)
print res
If you want you can use the data while you are scanning instead of loading it into memory. An easy way to allow other parts of your program to decide whether to load or not the whole data in memory, and not having to change the way to read it, is to use python generators:
def parseRows(f):
res = []
row = []
for l in file('/tmp/data.txt'):
line = l.strip().split()
if not line:
yield row
row = []
else:
row.extend(line)
for r in parseRows('/tmp/data.txt'):
print r
The result of parseRows is a "generator", an object that can be iterated as if it was a list, but computes it's values lazily.
Upvotes: 0
Reputation: 6881
This might not be an efficient way but have a look:
FILE = open("example.txt","r")
mystr = ""
for line in FILE:
mystr = mystr + line
myarray = mystr.split("\n\n")
myarraylist = list()
for arraystr in myarray:
arraystr = arraystr.strip('\n')
arraystr = myarraylist.append(arraystr.split())
print myarraylist
This outputs:
[['9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '89', '90', '91', '92', '93', '94', '95', '96', '8447', '8448', '8449', '8450', '8451', '845', '8453', '8454', '8488', '8489', '8490', '164624', '164625', '164626', '164627', '164628', '164629'], ['13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '91', '92', '93', '94', '95', '96', '97', '98', '8449', '8450', '8451', '8452', '8453', '8454', '8455', '8456', '8488', '8489', '8490', '8491', '164626', '164627', '164628', '164629', '164630', '164631', '164632', '164633', '164666', '164667', '164668'], ['17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '93', '94', '95', '96', '97', '98', '99', '100', '8451', '8452', '8453', '8454', '8455', '8456', '8457', '8458', '8489', '8490', '8491', '8492', '164628', '164629', '164630', '164631', '164632', '164633', '164634', '164635', '164666', '164667', '164668'], ['21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '95', '96', '97', '98', '99', '100', '101', '102', '8453', '8454', '8455', '8456', '8457', '8458', '8459', '8460', '8490', '8491', '8492', '8493', '164630', '164631', '164632', '164633', '164634', '164635', '164636', '164667', '164668', '164669', '164670']]
Upvotes: 2