Split one line to multiple lines based on pattern

Question

I have been working with regex the entire day to parse a complicated string into meaningfull data. I've nailed almost everything, but am left with this last problem:

I'm parsing a list of strings that represents a schedule. Every day is a seperate item in the list. Some days have multiple appointments on one day, like this line:

Tuesday 10/13/2011 SHIFT 00:00-08:00 Description of appointment DAYOFF 08:00-17:30 08:00-12:30 12:30-13:00 13:00-17:30 Description of appointment NIGHT 17:30-24:00 Description of appointment

I want this string to split into three lines based on the shift, but while maintaining the day and date. What all shifts have in common is that they consist of letters in caps, so [A-Z].

Expected output would be:

Tuesday 10/13/2011 SHIFT 00:00-08:00 Description of appointment
Tuesday 10/13/2011 DAYOFF 08:00-17:30 08:00-12:30 12:30-13:00 13:00-17:30 Description
Tuesday 10/13/2011 NIGHT 17:30-24:00 Description of appointment

I can't simply scan for all possible shifts, because they are unkown, the only thing that is for sure is that they are in all caps. Therefore I need to use regex.

I thought of a structure like this (regexmatch = a shift ([A-Z]{5,})):

placeholder = []
for day in schedule:
    newLine = []
    if day.count(regexmatch) > 1:
        newline.append(day[:2])       #To include day and date
        i = 2
        for i < len(day):
            if day[i] == regexmatch:
                placeholder.append(newLine)
                newLine = []
                newLine.append(day[:2])
                newLine.append(day[i])
            else:
                newLine.append(day[i])
        i += 1
    placeholder.append(newLine)

I hope this makes sense and someone can help me implement the regexmatch into this, or maybe take an entirely different route.

Gareth Rees · Accepted Answer

I'd organize the code to generate the appointments (instead of repeatedly appending to a list):

import re
day_re = re.compile(r'((?:Mon|Tues|Wednes|Thurs|Fri|Sat|Sun)day \d{2}/\d{2}/\d{4}) (.*)')
shift_re = re.compile(r'([A-Z]{5,} [^A-Z]*(?:[A-Z]{1,4}[^A-Z]+)*)')

def appointments(lines):
    """
    Given iterator `lines` containing one or more appointments per day,
    generate individual appointments.
    """
    for line in lines:
        day, remainder = day_re.match(line).groups()
        shifts = shift_re.findall(remainder)
        if shifts:
            for shift in shifts:
                yield '{} {}'.format(day, shift.strip())
        else:
            yield '{} {}'.format(day, remainder.strip())

Split one line to multiple lines based on pattern

Answers (1)

Related Questions