Reading in file with different number of spaces as delimiter?

Question

I'm trying to read in a file but it's looking really awkward because each of the spaces between columns is different. This is what I have so far:

with open('sextractordata1488.csv') as f:
    #getting rid of title, aka unusable lines:
    for _ in xrange(15):
        next(f)
    for line in f:
        cols = line.split(' ')
        #9 because it's 9 spaces before the first column with real data
        print cols[10]

I looked up how to do this and saw tr and sed commands that gave syntax errors when I attempted them, plus I wasn't really sure where in the code to put them (in the for loop or before it?). I want to reduce all the spaces between columns to one space so that I can consistently get the one column without issues (at the moment because it's a counter column from 1 to 101 I only get 10 through 99 and a bunch of spaces and parts from other columns in between because 1 and 101 have a different number of characters, and thus a different number of spaces from the beginning of the line).

Martijn Pieters · Accepted Answer

Just use str.split() without an argument. The string is then split on arbitrary width whitespace. That means it doesn't matter how many spaces there are between non-whitespace content anymore:

>>> '   this   is rather     		 hard            to parse  without	help
'.split()
['this', 'is', 'rather', 'hard', 'to', 'parse', 'without', 'help']

Note that leading and trailing whitespace are removed as well. Tabs, spaces, newlines, and carriage returns are all considered whitespace.

For completeness sake, the first argument can also be set to None for the same effect. This is helpful to know when you need to limit the split with the second argument:

>>> '   this   is rather     		 hard            to parse  without	help
'.split(None)
['this', 'is', 'rather', 'hard', 'to', 'parse', 'without', 'help']
>>> '   this   is rather     		 hard            to parse  without	help
'.split(None, 3)
['this', 'is', 'rather', 'hard            to parse  without	help
']

Reading in file with different number of spaces as delimiter?

Answers (2)

Related Questions