user1427661
user1427661

Reputation: 11774

Problems Parsing Lines in a Text File

I have a .txt file with a number of lines in the format "subject,value,workload" that I want to print to a dictionary in the format dict[subect]: (value, workload). This is my code for doing so:

for line in inputFile:
        lineList.append(line.split(",", 3))
for i in range(0, len(lineList)):
        subjectDict[lineList[i][0]] = (lineList[i][1], lineList[i][2])
        print subjectDict[lineList[i][0]]

However, running the program, my subjectDict values are being returned in tuples with this format: "'6', '2\r\n'." What's with this r\n business? I'm assuming it has something to do with the line breaks in the text file, which is why I put in the 3 maximum in my split method in the first place, but it seems to be interpreting 2\r\n as one string. Also, is there a more efficient way to turn these list items in the tuple each into integers, or should I just do:

subjectDict[lineList[i][0]] = (int(lineList[i][1]), int(lineList[i][2])) 

Thanks.

Upvotes: 0

Views: 142

Answers (3)

GreyAsClay
GreyAsClay

Reputation: 11

Here is what I suggest, using list comprehension :

with open(r"test.txt") as f:
    reBuff = [x.split(",") for x in f.readlines()]
    outDict = dict([(subject.strip(), (int(value.strip()), int(workload.strip()))) for subject,value,workload in reBuff])

Once you have a list or tuple in the format [(key, value),(key, value)], you can easily convert it to a dictionary.

Upvotes: 1

CrazyCasta
CrazyCasta

Reputation: 28312

Do the following:

for line in inputFile:
        lineList.append(line.strip().split(",", 3))
for i in range(0, len(lineList)):
        subjectDict[lineList[i][0]] = (lineList[i][1], lineList[i][2])
        print subjectDict[lineList[i][0]]

The strip method will get rid of any whitespace (including the \r\n characters) at the beginning and end of the string. The \r\n is the line ending (\r\n means you're probably opening a Windows file, Linux/Mac files generally use \n as the line ending).

As far as I know int(someStrVar) is the most efficient way of converting to an integer.

Assuming you have no interest in this lineList later you could do the following:

for line in inputFile:
        lineSplit = line.strip().split(",", 3)
        subjectDict[lineSplit [0]] = (lineSplit [1], lineSplit [2])
        print subjectDict[lineSplit [0]]

Upvotes: 1

Michael0x2a
Michael0x2a

Reputation: 64098

Try this:

output_dict = {}
with open(r"filename.txt") as f:
    for line in f:
        line = line.strip() # remove newlines and such (the '/r/n' bit)
        subject, value, workload = line.split(',', 3)
        output_dict[subject] = (int(value), int(workload))

So, I made several changes. I used line.strip() to remove any newlines (and surrounding whitespace) from your string. I also combined the two loops you had into one for efficiency.

To convert each item in a tuple to an int, you could do something like this:

my_tuple = tuple(int(i) for i in my_tuple)

...which is basically a generator comprehension converted into a tuple, but given that you only have two items to convert, it probably makes more sense to just type int(value) and int(workload), especially since you no longer have to type something unwieldy like int(lineList[i][1]).

Upvotes: 1

Related Questions