Reputation: 21917
I have a number of codes which I need to process, and these come through in a number of different formats which I need to manipulate first to get them in the right format:
Examples of codes:
ABC1.12 - correct format
ABC 1.22 - space between letters and numbers
ABC1.12/13 - 2 codes joined together and leading 1. missing from 13, should be ABC1.12 and ABC1.13
ABC 1.12 / 1.13 - codes joined together and spaces
I know how to remove the spaces but am not sure how to handle the codes which have been split. I know I can use the split
function to create 2 codes but not sure how I can then append the letters (and first number part) to the second code. This is the 3rd and 4th example in the list above.
WHAT I HAVE SO FAR
val = # code
retList = [val]
if "/" in val:
(code1, code2) = session_codes = val.split("/", 1)
(inital_letters, numbers) = code1.split(".", 1)
if initial_letters not in code2:
code2 = initial_letters + '.' + code2
# reset list so that it returns both values
retList = [code1, code2]
This won't really handle the splits for 4 as the code2 becomes ABC1.1.13
Upvotes: 3
Views: 174
Reputation: 4828
Take a look at this method. The might be the simple and yet best way to do.
val = unicode(raw_input())
for aChar in val:
if aChar.isnumeric():
lastIndex = val.index(aChar)
break
part1 = val[:lastIndex].strip()
part2 = val[lastIndex:]
if "/" not in part2:
print part1+part2
else:
if " " not in part2:
codes = []
divPart2 = part2.split(".")
partCodes = divPart2[1].split("/")
for aPart in partCodes:
codes.append(part1+divPart2[0]+"."+aPart)
print codes
else:
codes = []
divPart2 = part2.split("/")
for aPart in divPart2:
aPart = aPart.strip()
codes.append(part1+aPart)
print codes
Upvotes: 0
Reputation: 63707
You can use regex for this purpose
A possible implementation would be as follows
>>> def foo(st):
parts=st.replace(' ','').split("/")
parts=list(re.findall("^([A-Za-z]+)(.*)$",parts[0])[0])+parts[1:]
parts=parts[0:1]+[x.split('.') for x in parts[1:]]
parts=parts[0:1]+['.'.join(x) if len(x) > 1 else '.'.join([parts[1][0],x[0]]) for x in parts[1:]]
return [parts[0]+p for p in parts[1:]]
>>> foo('ABC1.12')
['ABC1.12']
>>> foo('ABC 1.22')
['ABC1.22']
>>> foo('ABC1.12/13')
['ABC1.12', 'ABC1.13']
>>> foo('ABC 1.12 / 1.13')
['ABC1.12', 'ABC1.13']
>>>
Upvotes: 3
Reputation: 88118
The answer by @Abhijit is a good, and for this simple problem reg-ex may be the way to go. However, when dealing with parsing problems, you'll often need a more extensible solution that can grow with your problem. I've found that pyparsing
is great for that, you write the grammar it does the parsing:
from pyparsing import *
index = Combine(Word(alphas))
# Define what a number is and convert it to a float
number = Combine(Word(nums)+Optional('.'+Optional(Word(nums))))
number.setParseAction(lambda x: float(x[0]))
# What do extra numbers look like?
marker = Word('/').suppress()
extra_numbers = marker + number
# Define what a possible line could be
line_code = Group(index + number + ZeroOrMore(extra_numbers))
grammar = OneOrMore(line_code)
From this definition we can parse the string:
S = '''ABC1.12
ABC 1.22
XXX1.12/13/77/32.
XYZ 1.12 / 1.13
'''
print grammar.parseString(S)
Giving:
[['ABC', 1.12], ['ABC', 1.22], ['XXX', 1.12, 13.0, 77.0, 32.0], ['XYZ', 1.12, 1.13]]
The number is now in the correct format, as we've type-casted them to floats during the parsing. Many more "numbers" are handled, look at the index "XXX", all numbers of type 1.12, 13, 32. are parsed, irregardless of decimal.
Upvotes: 0
Reputation: 3191
I suggest you write a regular expression for each code pattern and then form a larger regular expression which is the union of the individual ones.
Upvotes: 0
Reputation: 26150
Are you familiar with regex? That would be an angle worth exploring here. Also, consider splitting on the space character, not just the slash and decimal.
Upvotes: 1