Reputation: 3882
Matching a file in this form. It always begins with InvNo, ~EOR~ is End Of Record.
InvNo: 123
Tag1: rat cake
Media: d234
Tag2: rat pudding
~EOR~
InvNo: 5433
Tag1: strawberry tart
Tag5: 's got some rat in it
~EOR~
InvNo: 345
Tag2: 5
Media: d234
Tag5: rather a lot really
~EOR~
It should become
IN 123
UR blabla
**
IN 345
UR blibli
**
Where UR is a URL. I want to keep the InvNo as first tag. ** is now the end of record marker. This works:
impfile = filename[:4]
media = open(filename + '_earmark.dat', 'w')
with open(impfile, 'r') as f:
HASMEDIA = False
recordbuf = ''
for line in f:
if 'InvNo: ' in line:
InvNo = line[line.find('InvNo: ')+7:len(line)]
recordbuf = 'IN {}'.format(InvNo)
if 'Media: ' in line:
HASMEDIA = True
mediaref = line[7:len(line)-1]
URL = getURL(mediaref) # there's more to it, but that's not important now
recordbuf += 'UR {}\n'.format(URL))
if '~EOR~' in line:
if HASMEDIA:
recordbuf += '**\n'
media.write(recordbuf)
HASMEDIA = False
recordbuf = ''
media.close()
Is there a better, more Pythonic way? Working with the recordbuffer and the HASMEDIA flag seems, well, old hat. Any examples or tips for good or better practice?
(Also, I'm open to suggestions for a more to-the-point title to this post)
Upvotes: 4
Views: 371
Reputation: 880957
You could set InvNo
and URL
initially to None
, and only print a record when InvNo
and URL
are both not Falsish:
impfile = filename[:4]
with open(filename + '_earmark.dat', 'w') as media, open(impfile, 'r') as f:
InvNo = URL = None
for line in f:
if line.startswith('InvNo: '):
InvNo = line[line.find('InvNo: ')+7:len(line)]
if line.startswith('Media: '):
mediaref = line[7:len(line)-1]
URL = getURL(mediaref)
if line.startswith('~EOR~'):
if InvNo and URL:
recordbuf = 'IN {}\nUR {}\n**\n'.format(InvNo, URL)
media.write(recordbuf)
InvNo = URL = None
Note: I changed 'InvNo: ' in line
to line.startswith('InvNo: ')
based on the assumption that InvNo
always occurs at the beginning of the line. It appears to be true in your example, but the fact that you use line.find('InvNo: ')
suggests that 'InvNo:'
might appear anywhere in the line.
If InvNo:
appears only at the beginning of the line, then use line.startswith(...)
and remove line.find('InvNo: ')
(since it would equal 0).
Otherwise, you'll have to retain 'InvNo:' in line
and line.find
(and of course, the same goes for Media
and ~EOR~
).
The problem with using code like 'Media' in line
is that if the Tags
can contain anything, it might contain the string 'Media'
without being a true field header.
Upvotes: 3
Reputation: 264
Here is a version if you don't want to slice and if you ever need to write to the same output file again, you may not, you can change 'w' to 'a'.
with open('input_file', 'r') as f, open('output.dat', 'a') as media:
write_to_file = False
lines = f.readlines()
for line in lines:
if line.startswith('InvNo:'):
first_line = 'IN ' + line.split()[1] + '\n'
if line.startswith('Media:'):
write_to_file = True
if line.startswith('~EOR~') and write_to_file:
url = 'blabla' #Put getUrl() here
media.write(first_line + url + '\n' + '**\n')
write_to_file = False
first_line = ''
Upvotes: 0