Reputation: 291
I have a very unusual txt file where the observations are separated by a large amount of whitespaces and my code is getting the following error:
UPDATE:The issue is because I have all this mess at the top of the txt and I have no idea how to deal with apart from use enumerate and jump the lines , the problem is I have a more then 50 files that I have to parse..
LocationCode IndustryCode OccupationCode TotalResults SourceCode CreatedOn UpdatedOn
-------------- ---------------------------------------
--------------------- ------ -------------------------------------------------- ------------ ----- ------- -------------- ------------ ---------- --------------------------- ---------------------------
rftergt------------------
error:IndexError: list index out of range
Please see a couple of lines from the txt file:
8969758 35175 2018-05-03 18:32:11.9629608 21CIWS 130 NULL 2685 JSW 2018-05-03 18:32:12.1213757 2018-05-03 18:32:12.1213757
8969759 37132 2018-05-03 18:32:12.3444130 49TWNQ NULL NULL 654 JSW 2018-05-03 18:32:12.5069561 2018-05-03 18:32:12.5069561
8969761 319150 2018-05-03 18:32:16.6022496 49MCKY NULL NULL 678 JSW 2018-05-03 18:32:16.7648819 2018-05-03 18:32:16.7648819
My code:
first_row = True
with open("10_JobSearchLog.txt" ,'r')as f:
reader = csv.reader(f , delimiter =",")
header = next(reader)
for line in f:
if first_row:
first_row = False
continue
line = line.strip().split(" ")
print(line)
buck1,buck2,buck3,buck4 = line[0],line[1],line[3],line[4]
Upvotes: 0
Views: 66
Reputation: 1753
Resolve the issue on the update by modifying your code to this
with open(filename) as infile:
header = next(infile) #Header
for line in infile:
if line.strip(): #Check if line is not empty
line = line.split() #Split line by space
if len(line) >= 4:
buck1,buck2,buck3,buck4 = line[0],line[1],line[3],line[4]
print(buck1,buck2,buck3,buck4)
by doing this you make sure that if any part of the file is not according to your specific format, you skip that line, hence you will skip those messy headers :)
Upvotes: 1
Reputation: 82785
Use.
with open(filename) as infile:
header = next(infile) #Header
for line in infile:
if line.strip(): #Check if line is not empty
line = line.split() #Split line by space
buck1,buck2,buck3,buck4 = line[0],line[1],line[3],line[4]
print(buck1,buck2,buck3,buck4)
Output:
('8969759', '37132', '18:32:12.3444130', '49TWNQ')
('8969761', '319150', '18:32:16.6022496', '49MCKY')
Upvotes: 2