John
John

Reputation: 21917

python string manipulation and processing

I have a number of codes which I need to process, and these come through in a number of different formats which I need to manipulate first to get them in the right format:

Examples of codes:

ABC1.12 - correct format
ABC 1.22 - space between letters and numbers
ABC1.12/13 - 2 codes joined together and leading 1. missing from 13, should be ABC1.12 and ABC1.13 
ABC 1.12 / 1.13 - codes joined together and spaces

I know how to remove the spaces but am not sure how to handle the codes which have been split. I know I can use the split function to create 2 codes but not sure how I can then append the letters (and first number part) to the second code. This is the 3rd and 4th example in the list above.

WHAT I HAVE SO FAR

    val = # code
    retList = [val]
    if "/" in val:
        (code1, code2) = session_codes = val.split("/", 1)

        (inital_letters, numbers) = code1.split(".", 1)
        if initial_letters not in code2:
            code2 = initial_letters + '.' + code2

        # reset list so that it returns both values 
        retList = [code1, code2]

This won't really handle the splits for 4 as the code2 becomes ABC1.1.13

Upvotes: 3

Views: 174

Answers (5)

Surya Kasturi
Surya Kasturi

Reputation: 4828

Take a look at this method. The might be the simple and yet best way to do.

val = unicode(raw_input())

for aChar in val:
    if aChar.isnumeric():
        lastIndex = val.index(aChar)
        break

part1 = val[:lastIndex].strip()
part2 = val[lastIndex:]

if "/" not in part2:
    print part1+part2
else:
    if " " not in part2:
        codes = []
        divPart2 = part2.split(".")
        partCodes = divPart2[1].split("/")
        for aPart in partCodes:
            codes.append(part1+divPart2[0]+"."+aPart)
        print codes
    else:
        codes = []
        divPart2 = part2.split("/")
        for aPart in divPart2:
            aPart = aPart.strip()
            codes.append(part1+aPart)
        print codes

Upvotes: 0

Abhijit
Abhijit

Reputation: 63707

You can use regex for this purpose

A possible implementation would be as follows

>>> def foo(st):
    parts=st.replace(' ','').split("/")
    parts=list(re.findall("^([A-Za-z]+)(.*)$",parts[0])[0])+parts[1:]
    parts=parts[0:1]+[x.split('.') for x in parts[1:]]
    parts=parts[0:1]+['.'.join(x) if len(x) > 1 else '.'.join([parts[1][0],x[0]]) for x in parts[1:]]
    return [parts[0]+p for p in parts[1:]]

>>> foo('ABC1.12')
['ABC1.12']
>>> foo('ABC 1.22')
['ABC1.22']
>>> foo('ABC1.12/13')
['ABC1.12', 'ABC1.13']
>>> foo('ABC 1.12 / 1.13')
['ABC1.12', 'ABC1.13']
>>> 

Upvotes: 3

Hooked
Hooked

Reputation: 88118

Using PyParsing

The answer by @Abhijit is a good, and for this simple problem reg-ex may be the way to go. However, when dealing with parsing problems, you'll often need a more extensible solution that can grow with your problem. I've found that pyparsing is great for that, you write the grammar it does the parsing:

from pyparsing import *

index = Combine(Word(alphas))

# Define what a number is and convert it to a float
number = Combine(Word(nums)+Optional('.'+Optional(Word(nums))))
number.setParseAction(lambda x: float(x[0]))

# What do extra numbers look like?
marker = Word('/').suppress()
extra_numbers = marker + number

# Define what a possible line could be
line_code = Group(index + number + ZeroOrMore(extra_numbers))
grammar = OneOrMore(line_code)

From this definition we can parse the string:

S = '''ABC1.12
ABC 1.22
XXX1.12/13/77/32.
XYZ 1.12 / 1.13
'''
print grammar.parseString(S)

Giving:

[['ABC', 1.12], ['ABC', 1.22], ['XXX', 1.12, 13.0, 77.0, 32.0], ['XYZ', 1.12, 1.13]]

Advantages:

The number is now in the correct format, as we've type-casted them to floats during the parsing. Many more "numbers" are handled, look at the index "XXX", all numbers of type 1.12, 13, 32. are parsed, irregardless of decimal.

Upvotes: 0

mrk
mrk

Reputation: 3191

I suggest you write a regular expression for each code pattern and then form a larger regular expression which is the union of the individual ones.

Upvotes: 0

Silas Ray
Silas Ray

Reputation: 26150

Are you familiar with regex? That would be an angle worth exploring here. Also, consider splitting on the space character, not just the slash and decimal.

Upvotes: 1

Related Questions