qwerty22
qwerty22

Reputation: 101

How to read a file into multiple lists

Im working on a project where I need to read lines from a text file (named marks.txt) and from that put the values into lists. The file is organized so its just a matter of reading each line and adding it to each of the 5 lists in order, then repeating until the end of the file. I looked around, but was unable to find what Ineeded despite having a couple that seemed promising. This one had the right idea, but only applies to a single list, and this one seemed like it would be the answer, but its way more advanced than what I can use or understand. Filnally, I found this one, which is really close, and I tried doing something along the lines of answer 3 (miost similar to what I learned a couple years ago, but it does work for me. If anyone has any ideas on how i could adapt dome of those examples to my work that wuld be excellent. For this project, it doesnt have to be efficient, just working and simple in terms of elements used. Thi is what i have right now:

studentName= mark1=mark2= mark3=mark4 = []
dataFile=open(dataFileRaw, "r")
for line in dataFile:
    studentName.append(line) #line 1 goes to array one, line 6 goes to array 1, second value, etc
    mark1.append(line) #line 2 goes to array two
    mark2.append(line) #line 3 goes to array three
    mark3.append(line) #line 4 goes to array four
    mark4.append(line) #line 5 goes to array five
dataFile.close()

Upvotes: 1

Views: 3210

Answers (4)

PM 2Ring
PM 2Ring

Reputation: 55469

studentName= mark1=mark2= mark3=mark4 = [] won't do what you want. It creates a single list and binds it to multiple names. So if you modify studentName the modifications will replicate to mark1, mark2 etc. See List of lists changes reflected across sublists unexpectedly

But here's some code that uses a list of lists which will gather your data properly. It uses .strip() to strip leading & trailing white space (including newlines) from each line of data. It also uses the with keyword so you don't need to explicitly close the file.

marks = [[] for _ in range(5)]

with open(dataFileRaw, "r") as dataFile:
    for i, line in enumerate(dataFile):
        marks[i % 5].append(line.strip())

enumerate() is a built-in function that takes an iterable object as its first argument and an optional start number as its second argument. It returns a new iterable object that yields pairs of values (in the form of tuples), with the first value in the pair being a count, and the second value being the next element from the original iterable object.

From help(enumerate)

enumerate(iterable[, start]) -> iterator for index, value of iterable

Return an enumerate object. iterable must be another object that supports iteration. The enumerate object yields pairs containing a count (from start, which defaults to zero) and a value yielded by the iterable argument. enumerate is useful for obtaining an indexed list:

(0, seq[0]), (1, seq[1]), (2, seq[2]), ...

It may help to see some examples:

for i,c in enumerate('qwerty'):    
    print i, c

output

0 q
1 w
2 e
3 r
4 t
5 y

We can also supply a start argument to enumerate(), eg

seq = ['one', 'two', 'three']
for i, c in enumerate(seq, 1):
    print i, c

output

1 one
2 two
3 three

The % operator is the modulo operator. a % b yields the remainder when we divide the integer a by the integer b. Eg,

for i in range(12):
    print i % 4

output

0
1
2
3
0
1
2
3
0
1
2
3

Putting % together with enumerate() lets us do this sort of thing:

for i, c in enumerate('_abcdefghij'):
    print i%5, c

output

0 _
1 a
2 b
3 c
4 d
0 e
1 f
2 g
3 h
4 i
0 j

So do you now understand what

for i, line in enumerate(dataFile):
    marks[i % 5].append(line.strip())

does?

Upvotes: 3

gboffi
gboffi

Reputation: 25023

The answer

s, g1, g2, g3, g4 = [[line.strip() for line in group_of_lines] for group_of_lines in zip(*zip(*[open('marks.txt')]*5))]

How I derived the answer

From the prompt of the ipython shell:

In [38]: cat marks.txt
s1
g11
g12
g13
g14
s2
g21
g22
g23
g24
s3
g31
g32
g33
g34

In [39]: zip(*[open('marks.txt')]*5)
Out[39]: 
[('s1\n', 'g11\n', 'g12\n', 'g13\n', 'g14\n'),
 ('s2\n', 'g21\n', 'g22\n', 'g23\n', 'g24\n'),
 ('s3\n', 'g31\n', 'g32\n', 'g33\n', 'g34\n')]

In [40]: zip(*zip(*[open('marks.txt')]*5))
Out[40]: 
[('s1\n', 's2\n', 's3\n'),
 ('g11\n', 'g21\n', 'g31\n'),
 ('g12\n', 'g22\n', 'g32\n'),
 ('g13\n', 'g23\n', 'g33\n'),
 ('g14\n', 'g24\n', 'g34\n')]

In [41]: [[line.strip() for line in group_of_lines] for group_of_lines in zip(*zip(*[open('marks.txt')]*5))]
Out[41]: 
[['s1', 's2', 's3'],
 ['g11', 'g21', 'g31'],
 ['g12', 'g22', 'g32'],
 ['g13', 'g23', 'g33'],
 ['g14', 'g24', 'g34']]

In [42]: s, g1, g2, g3, g4 = [[line.strip() for line in group_of_lines] for group_of_lines in zip(*zip(*[open('marks.txt')]*5))]

In [43]: print '\n'.join(map(str,(s,g1,g2,g3,g4)))
['s1', 's2', 's3']
['g11', 'g21', 'g31']
['g12', 'g22', 'g32']
['g13', 'g23', 'g33']
['g14', 'g24', 'g34']

In [44]:

A line by line commentary on "How I derived the answer"

[38]

My personal version of the marks.txt data file

[39]

The crux of the procedure, the grouper procedure shamelessly adapted from the itertools module fine docs.

A file object can be simply understood as an iterator that returns the file content line by line, so we start with a list containing 5 (identical) copies of a file iterator that returns the content of our data file, and pass the elements of this list (by using the * star operator) to the zip builtin function, that returns a list of tuples with an element from each one of its arguments, e.g.:

In [44]: zip(*[[1,2,3],[10,20,30]])
Out[44]: [(1, 10), (2, 20), (3, 30)]

Because zip is passsed five identical copies of the same file iterator, it builds a list of tuples containing the first five lines, the second five lines, ..., of our file.

[40]

But we want it the other way around! Or, in other words, we want to transpose our list of tuples.

Transposition of a sequence of sequences is usually obtained with an idiom that's very similar to what we've just seen...

In [45]: zip(*[(1, 10), (2, 20), (3, 30)])
Out[45]: [(1, 2, 3), (10, 20, 30)]

[41]

What's the matter with all these '\n' newline charaters? Let's strip them away...

Our problem is that we have a double nesting, say a list of lists containing the elements that we want to correct...

We have no choice but to unpack the elements with a double loop and then pack twice our corrected, stripped items again in a list of lists...

[42]

We have a list of lists, whose elements are exactly what we want to associate to our variable names, this can be done in one sweep using what is called _sequence unpacking...

The statement 42 represents the compact solution to our problem. By a long time we all knew that 42 is the answer, now eventually we know the question too...

[43]

Just to verify that what we have in our variables is the result we are looking for.

Upvotes: 0

Aram
Aram

Reputation: 173

Try this, this make a tuple of lines from your file, then make a lists of strings from tuple as you like:

lines = tuple(open("marks.txt", 'r'))
list1, list2, list3, list4, list5 = [], [], [], [], []
i, linesCount = 0, len(lines)

while (i < linesCount):
    list1.append(lines[i].rstrip())
    i += 1
    if (i < linesCount):
        list2.append(lines[i].rstrip())
        i += 1
    if (i < linesCount):
        list3.append(lines[i].rstrip())
        i += 1
    if (i < linesCount):
        list4.append(lines[i].rstrip())
        i += 1
    if (i < linesCount):
        list5.append(lines[i].rstrip())
        i += 1

print list1, list2, list3, list4, list5

Upvotes: 0

jrgilman
jrgilman

Reputation: 483

The problem you have here is that you haven't actually read the data from the dataFileRaw, you have simply instantiated the file using open(). You have to make sure you go about reading all of the data from the file via something along the lines of:

dataFromFile = dataFile.read()

This will pull all of the data as a string into the dataFile variable. The nice part about this is that afterwards, you can simply split this dataFile up into a list by splitting at the newline escape character \n (which is automatically added when you hit Enter in a text editor).

dataFromFile = dataFromFile.split("\n")[:-1]

The reason I added in the [:-1] at the end is because if you end each line in the text file with \n (it automatically is in the file if you used enter) it will cause the last element in the list to be empty, and you don't want to deal with this data, thus the [:-1] indicates that we are only interested in the range of data from index 0 inclusive, to the last element exclusive. Simply put, we drop the last list element.

And from there you simply switch the for loop to iterate through dataFromFile rather than dataFile.

Upvotes: 1

Related Questions