python_amateur
python_amateur

Reputation: 85

python not properly reading in text file

I'm trying to read in a text file that looks something like this:

Date, StartTime, EndTime 
6/8/14, 1832, 1903
6/8/14, 1912, 1918
6/9/14, 1703, 1708
6/9/14, 1713, 1750

and this is what I have:

g = open('Observed_closure_info.txt', 'r')
closure_date=[]
closure_starttime=[]
closure_endtime=[]
file_data1 = g.readlines()
for line in file_data1[1:]:
    data1=line.split(', ')
    closure_date.append(str(data1[0]))
    closure_starttime.append(str(data1[1]))
    closure_endtime.append(str(data1[2]))

I did it this way for a previous file that was very similar to this one, and everything worked fine. However, this file isn't being read in properly. First it gives me an error "list index out of range" for closure_starttime.append(str(data1[1])) and when I ask for it to print what it has for data1 or closure_date, it gives me something like

['\x006\x00/\x008\x00/\x001\x004\x00,\x00 \x001\x008\x003\x002\x00,\x00 \x001\x009\x000\x003\x00\r\x00\n']

I've tried rewriting the text file in case there was something corrupt about that particular file, and it still does the same thing. I'm not sure why because last time this worked fine.

Any suggestions? Thanks!

Upvotes: 6

Views: 1256

Answers (2)

efirvida
efirvida

Reputation: 4855

try this

g = open('Observed_closure_info.txt', 'r')
closure_date=[]
closure_starttime=[]
closure_endtime=[]
file_data1 = g.readlines()
for line in file_data1[1:]:
    data1=line.decode('utf-16').split(',')
    closure_date.append(str(data1[0]))
    closure_starttime.append(str(data1[1]))
    closure_endtime.append(str(data1[2]))

Upvotes: 1

nneonneo
nneonneo

Reputation: 179422

This looks like a comma-separated file with UTF-16 encoding (hence the \x00 null bytes). You'll have to decode the input from UTF-16, like so:

import codecs

closure_date=[]
closure_starttime=[]
closure_endtime=[]
with codecs.open('Observed_closure_info.txt', 'r', 'utf-16-le') as g:
    g.next() # skip header line
    for line in g:
        date, start, end = line.strip().split(', ')
        closure_date.append(date)
        closure_starttime.append(start)
        closure_endtime.append(end)

Upvotes: 6

Related Questions