Derek Swingley
Derek Swingley

Reputation: 8752

Better Way to Write This List Comprehension?

I'm parsing a string that doesn't have a delimiter but does have specific indexes where fields start and stop. Here's my list comprehension to generate a list from the string:

field_breaks = [(0,2), (2,10), (10,13), (13, 21), (21, 32), (32, 43), (43, 51), (51, 54), (54, 55), (55, 57), (57, 61), (61, 63), (63, 113), (113, 163), (163, 213), (213, 238), (238, 240), (240, 250), (250, 300)]
s = '4100100297LICACTIVE  09-JUN-198131-DEC-2010P0         Y12490227WYVERN RESTAURANTS INC                            1351 HEALDSBURG AVE                                                                                 HEALDSBURG               CA95448     ROUND TABLE PIZZA                                 575 W COLLEGE AVE                                 STE 201                                           SANTA ROSA               CA95401               '
data = [s[x[0]:x[1]].strip() for x in field_breaks]

Any recommendation on how to improve this?

Upvotes: 2

Views: 275

Answers (4)

Owen S.
Owen S.

Reputation: 7855

To be honest, I don't find the parse-by-column-number approach very readable, and I question its maintainability (off by one errors and the like). Though I'm sure the list comprehensions are very virtuous and efficient in this case, and the suggested zip-based solution has a nice functional tweak to it.

Instead, I'm going to throw softballs from out here in left field, since list comprehensions are supposed to be in part about making your code more declarative. For something completely different, consider the following approach based on the pyparsing module:

def Fixed(chars, width):
    return Word(chars, exact=width)

myDate = Combine(Fixed(nums,2) + Literal('-') + Fixed(alphas,3) + Literal('-')
                 + Fixed(nums,4))

fullRow = Fixed(nums,2) + Fixed(nums,8) + Fixed(alphas,3) + Fixed(alphas,8)
          + myDate + myDate + ...

data = fullRow.parseString(s)
# should be ['41', '00100297', 'LIC', 'ACTIVE  ', 
#            '09-JUN-1981', '31-DEC-2010', ...]

To make this even more declarative, you could name each of the fields as you come across them. I have no idea what the fields actually are, but something like:

someId = Fixed(nums,2)
someOtherId = Fixed(nums,8)
recordType = Fixed(alphas,3)
recordStatus = Fixed(alphas,8)
birthDate = myDate
issueDate = myDate
fullRow = someId + someOtherId + recordType + recordStatus
          + birthDate + issueDate + ...

Now an approach like this probably isn't going to break any land speed records. But, holy cow, wouldn't you find this easier to read and maintain?

Upvotes: 3

John La Rooy
John La Rooy

Reputation: 304137

Here is a way using map

data = map(s.__getslice__, *zip(*field_breaks))

Upvotes: 0

dan04
dan04

Reputation: 90995

You can cut your field_breaks list in half by doing:

field_breaks = [0, 2, 10, 13, 21, 32, 43, ..., 250, 300]
s = ...
data = [s[x[0]:x[1]].strip() for x in zip(field_breaks[:-1], field_breaks[1:])]

Upvotes: 7

Tomasz Wysocki
Tomasz Wysocki

Reputation: 11568

You can use tuple unpacking for cleaner code:

data = [s[a:b].strip() for a,b in field_breaks]

Upvotes: 7

Related Questions