user2957178
user2957178

Reputation: 113

Issue reading arrays from a .txt file

I'm trying to import and array from a txt file in Python.

The file is a bunch of arrays or "list" I'm not sure of the terminology basically each array contains a varying amount of integers between 15 and 30 or so.

Any method I try only reads per line but this wont work for me as one array spans 4 lines and i need to read in each array as a whole.

The format of the data is as follows:

  9     10     11     12     13     14     15     16     17     18
  19     20     21     22     23     24     89     90     91     92
  93     94     95     96   8447   8448   8449   8450   8451   845
 8453   8454   8488   8489   8490 164624 164625 164626 164627 164628
 164629

 13     14     15     16     17     18     19     20     21     22
 23     24     25     26     27     28     91     92     93     94
 95     96     97     98   8449   8450   8451   8452   8453   8454
8455   8456   8488   8489   8490   8491 164626 164627 164628 164629
164630 164631 164632 164633 164666 164667 164668

17     18     19     20     21     22     23     24     25     26
 27     28     29     30     31     32     93     94     95     96
 97     98     99    100   8451   8452   8453   8454   8455   8456
8457   8458   8489   8490   8491   8492 164628 164629 164630 164631
164632 164633 164634 164635 164666 164667 164668

21     22     23     24     25     26     27     28     29     30
 31     32     33     34     35     36     95     96     97     98
 99    100    101    102   8453   8454   8455   8456   8457   8458
8459   8460   8490   8491   8492   8493 164630 164631 164632 164633
164634 164635 164636 164667 164668 164669 164670

I am the one generating this file so i can change it in anyway that would make it simpler.

I've tried -

readlines genfromtxt loadtxt

Anything I could find any output that I can get to work outputs per line so the first entry would be:

9 10 11 12 13 14 15 16 17 18

as opposed to:

9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 89 90 91 92 93 94 95 96 8447 8448 8449 8450 8451 845 8453 8454 8488 8489 8490 164624 164625 164626 164627 164628 164629

This is the code used to generate the output file:

for i in c_array:
n_array = []
for j in i:     
    for k in range(8):
        a = []
        sorted_c_array = sorted_c_arrays[k]
        c_col = sorted_c_array[:,k]
        b = (binarySearch(c_col,j,sorted_c_array))
        if b == True: 
            n_array.append(np.array(a))
        else:
            continue
n_array = np.reshape(n_array,(1,(np.size(n_array))))
n_array = np.unique(n_array)
output.writelines(str(n_array).replace(']',']\n')) code here

Using this:

c_array = []
a = []
for l in file("C:/Users/09432191/SkyDrive/Masters/python/Finished programs/Pre-  Prosessing/current_conectivity2.dat"):
line = l.strip()
if l == "\n" :
    c_array.append(a)
    a = []
a.append(line)

print c_array[0]

I got as far as, I can't figure out how to get rid of the unwanted characters though:

['[     9     10     11     12     13     14     15     16     17     18', '19     20     21     22     23     24     89     90     91     92', '93     94     95     96   8447   8448   8449   8450   8451   8452', '8453   8454   8488   8489   8490 164624 164625 164626 164627 164628', '164629]']

Upvotes: 1

Views: 174

Answers (3)

user2379410
user2379410

Reputation:

If you only need to access the file with Numpy, you can use np.save and np.load. This stores the data in a much more convenient format: there are no conversions needed from integer to string or vice versa and this is much faster than using text files. Also the code becomes really simple and straightforward:

import numpy as np

arr = np.random.randint(1, 200000, (180000, 47))

np.save('test.npy', arr)  # 250 milisec on my system
loaded_arr = np.load('test.npy')  # 55 milisec on my system


# alternatively using text based files:
np.savetxt('test.txt', arr)  # 19 seconds
loaded_arr = np.loadtxt('test.txt', dtype=np.int)  # 32 seconds

This way you don't have 180000 separate arrays, but a one big data structure, in which you can access each (sub) array by slicing. However, also when you save the data it should be one single 2D array, but it shouldn't be to difficult to adapt your code to save your data in this format (if each sub array has the same size at least).

Upvotes: 2

mkm
mkm

Reputation: 1575

A simple way to read this data back is to just read line by line and accumulate the values into an array, let's call it "row". Then when we read an empty line we just append the row to the result array and clear the current "row". If we want to handle the case when the file doesn't end with an empty line, then we have to handle that case explicitly:

res = []
row = []
for l in file('/tmp/data.txt'):
    line = l.strip().split()
    if not line:
        res.append(row)
        row = []
    else:
        row.extend(line)
res.append(row)

print res

If you want you can use the data while you are scanning instead of loading it into memory. An easy way to allow other parts of your program to decide whether to load or not the whole data in memory, and not having to change the way to read it, is to use python generators:

def parseRows(f):
    res = []
    row = []
    for l in file('/tmp/data.txt'):
        line = l.strip().split()
        if not line:
            yield row
            row = []
        else:
            row.extend(line)


for r in parseRows('/tmp/data.txt'):
    print r

The result of parseRows is a "generator", an object that can be iterated as if it was a list, but computes it's values lazily.

Upvotes: 0

Lucas Kauffman
Lucas Kauffman

Reputation: 6881

This might not be an efficient way but have a look:

FILE = open("example.txt","r")
mystr = ""

for line in FILE:
    mystr = mystr + line

myarray = mystr.split("\n\n")
myarraylist = list()

for arraystr in myarray:
    arraystr = arraystr.strip('\n')
    arraystr = myarraylist.append(arraystr.split())

print myarraylist

This outputs:

[['9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '89', '90', '91', '92', '93', '94', '95', '96', '8447', '8448', '8449', '8450', '8451', '845', '8453', '8454', '8488', '8489', '8490', '164624', '164625', '164626', '164627', '164628', '164629'], ['13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '91', '92', '93', '94', '95', '96', '97', '98', '8449', '8450', '8451', '8452', '8453', '8454', '8455', '8456', '8488', '8489', '8490', '8491', '164626', '164627', '164628', '164629', '164630', '164631', '164632', '164633', '164666', '164667', '164668'], ['17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '93', '94', '95', '96', '97', '98', '99', '100', '8451', '8452', '8453', '8454', '8455', '8456', '8457', '8458', '8489', '8490', '8491', '8492', '164628', '164629', '164630', '164631', '164632', '164633', '164634', '164635', '164666', '164667', '164668'], ['21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '95', '96', '97', '98', '99', '100', '101', '102', '8453', '8454', '8455', '8456', '8457', '8458', '8459', '8460', '8490', '8491', '8492', '8493', '164630', '164631', '164632', '164633', '164634', '164635', '164636', '164667', '164668', '164669', '164670']]

Upvotes: 2

Related Questions