Ian_De_Oliveira
Ian_De_Oliveira

Reputation: 291

reading a txt file in python with more than one space between the observation

I have a very unusual txt file where the observations are separated by a large amount of whitespaces and my code is getting the following error:

UPDATE:The issue is because I have all this mess at the top of the txt and I have no idea how to deal with apart from use enumerate and jump the lines , the problem is I have a more then 50 files that I have to parse..

LocationCode IndustryCode OccupationCode TotalResults SourceCode           CreatedOn                   UpdatedOn

-------------- --------------------------------------- 
---------------------    ------ -------------------------------------------------- ------------ -----     ------- -------------- ------------ ---------- ---------------------------      ---------------------------
        rftergt------------------




error:IndexError: list index out of range

Please see a couple of lines from the txt file:

8969758        35175                                   2018-05-03 18:32:11.9629608                                                    21CIWS       130          NULL           2685         JSW        2018-05-03 18:32:12.1213757 2018-05-03 18:32:12.1213757

8969759        37132                                   2018-05-03 18:32:12.3444130                                                    49TWNQ       NULL         NULL           654          JSW        2018-05-03 18:32:12.5069561 2018-05-03 18:32:12.5069561

8969761        319150                                  2018-05-03 18:32:16.6022496                                                    49MCKY       NULL         NULL           678          JSW        2018-05-03 18:32:16.7648819 2018-05-03 18:32:16.7648819

My code:

first_row = True
with open("10_JobSearchLog.txt" ,'r')as f:

    reader = csv.reader(f , delimiter =",")
    header = next(reader)

    for line in f:
        if first_row:
            first_row = False
            continue

        line = line.strip().split(" ")
        print(line)
        buck1,buck2,buck3,buck4 = line[0],line[1],line[3],line[4]

Upvotes: 0

Views: 66

Answers (2)

Imtinan Azhar
Imtinan Azhar

Reputation: 1753

Resolve the issue on the update by modifying your code to this

with open(filename) as infile:
    header = next(infile)  #Header
    for line in infile:
        if line.strip():             #Check if line is not empty
            line = line.split()      #Split line by space
            if len(line) >= 4:
                buck1,buck2,buck3,buck4 = line[0],line[1],line[3],line[4]
                print(buck1,buck2,buck3,buck4)

by doing this you make sure that if any part of the file is not according to your specific format, you skip that line, hence you will skip those messy headers :)

Upvotes: 1

Rakesh
Rakesh

Reputation: 82785

Use.

with open(filename) as infile:
    header = next(infile)  #Header
    for line in infile:
        if line.strip():             #Check if line is not empty
            line = line.split()      #Split line by space
            buck1,buck2,buck3,buck4 = line[0],line[1],line[3],line[4]
            print(buck1,buck2,buck3,buck4)

Output:

('8969759', '37132', '18:32:12.3444130', '49TWNQ')
('8969761', '319150', '18:32:16.6022496', '49MCKY')

Upvotes: 2

Related Questions