Reputation: 8752
I'm parsing a string that doesn't have a delimiter but does have specific indexes where fields start and stop. Here's my list comprehension to generate a list from the string:
field_breaks = [(0,2), (2,10), (10,13), (13, 21), (21, 32), (32, 43), (43, 51), (51, 54), (54, 55), (55, 57), (57, 61), (61, 63), (63, 113), (113, 163), (163, 213), (213, 238), (238, 240), (240, 250), (250, 300)]
s = '4100100297LICACTIVE 09-JUN-198131-DEC-2010P0 Y12490227WYVERN RESTAURANTS INC 1351 HEALDSBURG AVE HEALDSBURG CA95448 ROUND TABLE PIZZA 575 W COLLEGE AVE STE 201 SANTA ROSA CA95401 '
data = [s[x[0]:x[1]].strip() for x in field_breaks]
Any recommendation on how to improve this?
Upvotes: 2
Views: 275
Reputation: 7855
To be honest, I don't find the parse-by-column-number approach very readable, and I question its maintainability (off by one errors and the like). Though I'm sure the list comprehensions are very virtuous and efficient in this case, and the suggested zip-based solution has a nice functional tweak to it.
Instead, I'm going to throw softballs from out here in left field, since list comprehensions are supposed to be in part about making your code more declarative. For something completely different, consider the following approach based on the pyparsing
module:
def Fixed(chars, width):
return Word(chars, exact=width)
myDate = Combine(Fixed(nums,2) + Literal('-') + Fixed(alphas,3) + Literal('-')
+ Fixed(nums,4))
fullRow = Fixed(nums,2) + Fixed(nums,8) + Fixed(alphas,3) + Fixed(alphas,8)
+ myDate + myDate + ...
data = fullRow.parseString(s)
# should be ['41', '00100297', 'LIC', 'ACTIVE ',
# '09-JUN-1981', '31-DEC-2010', ...]
To make this even more declarative, you could name each of the fields as you come across them. I have no idea what the fields actually are, but something like:
someId = Fixed(nums,2)
someOtherId = Fixed(nums,8)
recordType = Fixed(alphas,3)
recordStatus = Fixed(alphas,8)
birthDate = myDate
issueDate = myDate
fullRow = someId + someOtherId + recordType + recordStatus
+ birthDate + issueDate + ...
Now an approach like this probably isn't going to break any land speed records. But, holy cow, wouldn't you find this easier to read and maintain?
Upvotes: 3
Reputation: 304137
Here is a way using map
data = map(s.__getslice__, *zip(*field_breaks))
Upvotes: 0
Reputation: 90995
You can cut your field_breaks
list in half by doing:
field_breaks = [0, 2, 10, 13, 21, 32, 43, ..., 250, 300]
s = ...
data = [s[x[0]:x[1]].strip() for x in zip(field_breaks[:-1], field_breaks[1:])]
Upvotes: 7
Reputation: 11568
You can use tuple unpacking for cleaner code:
data = [s[a:b].strip() for a,b in field_breaks]
Upvotes: 7