split a list read in from file at the commas into a list of seperate elements

Question

The problem with reading in the contents of a file, is that when read into a list, it formats it as one big string. Students need to be able to work with this "read" in data from the file, to isolate the ID number, and return the Student (for example).

I am aware of several methods that this could be done, for instance, regular expressions, converting to string, and using the split method, but would be interested, for teaching purposes, of the easiest, most elegant method (and by elegant, I mean avoiding multiple and unnecessary steps). Ideally, is there a way to read it into the list, directly from the text file, in the required format:

For instance,

instead of the current format (which also includes that I would need to strip):

['001,Joe,Bloggs,Test1:99,Test2:100,Test3:33
', '002,Ash,Smith,Test1:22,Test2:63,Test3:99
']

Required format: Either a 1d or 2d list like below

[['001','Joe','Bloggs','Test1:99','Test2:100','Test3:33'],['002','Ash','Smith','Test1:22','Test2:63','Test3:99']]

I'd be happy for people to post solutions including reg ex and split string, as it will help others, but is there a way to do this more simply?

Full Code listing with text file (repl it online:

https://repl.it/J8jB/2

Code:

f = open("studentinfo.txt","r") 
myList = []
for line in f:
    myList.append(line)
print(myList)
print()
print()
print(myList[0])
myList.split(",")
print(myList)

#split the list where all the individual elements in the current string (in the list) are split up at the ","

Text file:

001,Joe,Bloggs,Test1:99,Test2:100,Test3:33
002,Ash,Smith,Test1:22,Test2:63,Test3:99

Jean-Fran&#231;ois Fabre · Accepted Answer

Once the list is built (or directly with the file handle as l, there's no need to store the list first) I would just rstrip and split in a list comprehension like this:

l = ['001,Joe,Bloggs,Test1:99,Test2:100,Test3:33
', '002,Ash,Smith,Test1:22,Test2:63,Test3:99
']

newl = [v.rstrip().split(",") for v in l]

print(newl)

result:

[['001', 'Joe', 'Bloggs', 'Test1:99', 'Test2:100', 'Test3:33'], ['002', 'Ash', 'Smith', 'Test1:22', 'Test2:63', 'Test3:99']]

for a flat list do a double loop instead (or use itertools.chain.from_iterable, well there are a lot of ways to do that):

newl = [x for v in l for x in v.rstrip().split(",")]

without listcomp (just for "readability" when you're not used to listcomps, after that, switch to listcomps :)):

newl = []
for v in l:
    newl.append(v.rstrip().split(","))

(use extend instead of append to get a flat list)

of course I always forget to mention csv which has default separator as comma and strips the newlines:

import csv
newl = list(csv.reader(l))

flat (using itertools this time):

newl = list(itertools.chain.from_iterable(csv.reader(l)))

(l can be a file handle or a list of lines for the csv module)

split a list read in from file at the commas into a list of seperate elements

Answers (2)

Related Questions