Compoot
Compoot

Reputation: 2387

split a list read in from file at the commas into a list of seperate elements

The problem with reading in the contents of a file, is that when read into a list, it formats it as one big string. Students need to be able to work with this "read" in data from the file, to isolate the ID number, and return the Student (for example).

I am aware of several methods that this could be done, for instance, regular expressions, converting to string, and using the split method, but would be interested, for teaching purposes, of the easiest, most elegant method (and by elegant, I mean avoiding multiple and unnecessary steps). Ideally, is there a way to read it into the list, directly from the text file, in the required format:

For instance,

instead of the current format (which also includes \n that I would need to strip):

['001,Joe,Bloggs,Test1:99,Test2:100,Test3:33\n', '002,Ash,Smith,Test1:22,Test2:63,Test3:99\n']

Required format: Either a 1d or 2d list like below

[['001','Joe','Bloggs','Test1:99','Test2:100','Test3:33'],['002','Ash','Smith','Test1:22','Test2:63','Test3:99']]

I'd be happy for people to post solutions including reg ex and split string, as it will help others, but is there a way to do this more simply?

Full Code listing with text file (repl it online:

https://repl.it/J8jB/2

Code:

f = open("studentinfo.txt","r") 
myList = []
for line in f:
    myList.append(line)
print(myList)
print()
print()
print(myList[0])
myList.split(",")
print(myList)

#split the list where all the individual elements in the current string (in the list) are split up at the ","

Text file:

001,Joe,Bloggs,Test1:99,Test2:100,Test3:33
002,Ash,Smith,Test1:22,Test2:63,Test3:99

Upvotes: 1

Views: 2960

Answers (2)

Jean-François Fabre
Jean-François Fabre

Reputation: 140138

Once the list is built (or directly with the file handle as l, there's no need to store the list first) I would just rstrip and split in a list comprehension like this:

l = ['001,Joe,Bloggs,Test1:99,Test2:100,Test3:33\n', '002,Ash,Smith,Test1:22,Test2:63,Test3:99\n']

newl = [v.rstrip().split(",") for v in l]

print(newl)

result:

[['001', 'Joe', 'Bloggs', 'Test1:99', 'Test2:100', 'Test3:33'], ['002', 'Ash', 'Smith', 'Test1:22', 'Test2:63', 'Test3:99']]

for a flat list do a double loop instead (or use itertools.chain.from_iterable, well there are a lot of ways to do that):

newl = [x for v in l for x in v.rstrip().split(",")]

without listcomp (just for "readability" when you're not used to listcomps, after that, switch to listcomps :)):

newl = []
for v in l:
    newl.append(v.rstrip().split(","))

(use extend instead of append to get a flat list)

of course I always forget to mention csv which has default separator as comma and strips the newlines:

import csv
newl = list(csv.reader(l))

flat (using itertools this time):

newl = list(itertools.chain.from_iterable(csv.reader(l)))

(l can be a file handle or a list of lines for the csv module)

Upvotes: 4

Serge Ballesta
Serge Ballesta

Reputation: 148860

That is a good use case for the csv module:

import csv

with open("studentinfo.txt","r") as f:
    rd = csv.reader(f)
    lst = list(rd)    # lst is a list of lists in expected format
    ...               # further processing on lst

Alternatively, it is trivial to process the file line by line

with open("studentinfo.txt","r") as f:
    rd = csv.reader(f)
    for row in rd:          # row is list of fields
        ...                 # further processing on row

Upvotes: 2

Related Questions