RolfBly
RolfBly

Reputation: 3882

Python flow control with Flag?

Matching a file in this form. It always begins with InvNo, ~EOR~ is End Of Record.

InvNo: 123
Tag1: rat cake
Media: d234
Tag2: rat pudding
~EOR~
InvNo: 5433
Tag1: strawberry tart
Tag5: 's got some rat in it 
~EOR~
InvNo: 345
Tag2: 5
Media: d234
Tag5: rather a lot really
~EOR~

It should become

IN 123
UR blabla
**
IN 345
UR blibli
**

Where UR is a URL. I want to keep the InvNo as first tag. ** is now the end of record marker. This works:

impfile = filename[:4]
media = open(filename + '_earmark.dat', 'w')

with open(impfile, 'r') as f: 
    HASMEDIA = False
    recordbuf = ''

    for line in f:
        if 'InvNo: ' in line:
            InvNo = line[line.find('InvNo: ')+7:len(line)]  
            recordbuf = 'IN {}'.format(InvNo)

        if 'Media: ' in line:
            HASMEDIA = True
            mediaref = line[7:len(line)-1]

            URL = getURL(mediaref) # there's more to it, but that's not important now  
            recordbuf += 'UR {}\n'.format(URL))

        if '~EOR~' in line:
            if HASMEDIA:
                recordbuf += '**\n'
                media.write(recordbuf)
                HASMEDIA = False

            recordbuf = ''

media.close()

Is there a better, more Pythonic way? Working with the recordbuffer and the HASMEDIA flag seems, well, old hat. Any examples or tips for good or better practice?

(Also, I'm open to suggestions for a more to-the-point title to this post)

Upvotes: 4

Views: 371

Answers (2)

unutbu
unutbu

Reputation: 880957

You could set InvNo and URL initially to None, and only print a record when InvNo and URL are both not Falsish:

impfile = filename[:4]
with open(filename + '_earmark.dat', 'w') as media, open(impfile, 'r') as f:
    InvNo = URL = None
    for line in f:
        if line.startswith('InvNo: '):
            InvNo = line[line.find('InvNo: ')+7:len(line)]  

        if line.startswith('Media: '):
            mediaref = line[7:len(line)-1]
            URL = getURL(mediaref) 

        if line.startswith('~EOR~'):
            if InvNo and URL:
                recordbuf = 'IN {}\nUR {}\n**\n'.format(InvNo, URL)
                media.write(recordbuf)
            InvNo = URL = None

Note: I changed 'InvNo: ' in line to line.startswith('InvNo: ') based on the assumption that InvNo always occurs at the beginning of the line. It appears to be true in your example, but the fact that you use line.find('InvNo: ') suggests that 'InvNo:' might appear anywhere in the line.

If InvNo: appears only at the beginning of the line, then use line.startswith(...) and remove line.find('InvNo: ') (since it would equal 0).

Otherwise, you'll have to retain 'InvNo:' in line and line.find (and of course, the same goes for Media and ~EOR~). The problem with using code like 'Media' in line is that if the Tags can contain anything, it might contain the string 'Media' without being a true field header.

Upvotes: 3

jgranger
jgranger

Reputation: 264

Here is a version if you don't want to slice and if you ever need to write to the same output file again, you may not, you can change 'w' to 'a'.

with open('input_file', 'r') as f, open('output.dat', 'a') as media:
    write_to_file = False
    lines = f.readlines()
    for line in lines:
        if line.startswith('InvNo:'):
            first_line = 'IN ' + line.split()[1] + '\n'
        if line.startswith('Media:'):
            write_to_file = True
        if line.startswith('~EOR~') and write_to_file:
            url = 'blabla' #Put getUrl() here
            media.write(first_line + url + '\n' + '**\n')
            write_to_file = False
            first_line = ''

Upvotes: 0

Related Questions