Reputation: 811
I am trying to do some changes in the contents of an input file. The input file I have looks like the following:
18800000 20400000 pau
20400000 21300000 aa
21300000 22500000 p
22500000 23200000 l
23200000 24000000 ay
24000000 25000000 k
25000000 26500000 pau
This file is a transcription of an audio file. The first number denotes the start time and the next one denotes the end time. Then the alphabets denote the sound.
The change I have to make is, there are a few sounds which is made of two different sounds ie there are some diphthongs too. So these diphthongs have to be split into the two sounds. In the example above the diphthong is 'ay'. It is made of 'ao' and 'ih'. What happens here is, the duration of 'ay' which is 24000000 - 232000000 = 8 is distributed into these two sounds. The result will be,
23200000 24000000 ay
changes to
23200000 236000000 ao
23600000 240000000 ih
I have attempted to write a pseudo code which looks rubbish.
def test(transcriptionFile) :
with open("transcriptions.txt", "r+") as tFile :
for line in tFile :
if 3rd_item = ay
duration = (2nd_item[1] - 1st_item[2]) / 2
delete the line
tFile.write(1st_item, 1st_item + d, ao)
tfile.write(1st_item + d, 1st_item, ih) # next line
if__name__ == "__main__" :
test("transcriptions.txt")
Thank you.
With the suggestions I was given I changed the code to the following. It is still not correct.
def test(transcriptionFile) :
with open("transcriptions.txt", "r") as tFile :
inp = tFile.readlines()
outp = []
for ln in inp :
start, end, sound = ln.strip()
if sound == ay :
duration = (end - start) / 2
ln.delete
start = start
end = start + duration
sound = ao
outp.append(ln)
start = start + duration # next line
end = start
sound = ih
outp.append(ln)
with open("transcriptions.txt", "w") as tFile:
tFile.writelines(outp)
__name__ == "__main__"
test("transcriptions.txt")
Upvotes: 1
Views: 93
Reputation: 120608
The following script should do what you want:
import sys
def main(src, dest):
with open(dest, 'w') as output:
with open(src) as source:
for line in source:
try:
start, end, sound = line.split()
except ValueError:
continue
if sound == 'ay':
start = int(start)
end = int(end)
offset = (end - start) // 2
output.write('%s %s ao\n' % (start, start + offset))
output.write('%s %s ih\n' % (start + offset, end))
else:
output.write(line)
if __name__ == "__main__":
main(*sys.argv[1:])
Output:
18800000 20400000 pau
20400000 21300000 aa
21300000 22500000 p
22500000 23200000 l
23200000 23600000 ao
23600000 24000000 ih
24000000 25000000 k
25000000 26500000 pau
Upvotes: 1
Reputation: 363607
Editing a text file in-place is pretty hard. Your best options are:
Write the program as a Unix filter, i.e. produce the new file on sys.stdout
and put it in place with external tools
Read in the whole file, then construct the new file in memory and write it out.
A program following the second line of thought would look like:
# read transcriptions.txt into a list of lines
with open("transcriptions.txt", "r") as tFile:
inp = tFile.readlines()
# do processing and build a new list of lines
outp = []
for ln in inp:
if not to_be_deleted(ln):
outp.append(transform(ln))
# now overwrite transcriptions.txt
with open("transcriptions.txt", "w") as tFile:
tFile.writelines(outp)
It would be even better if you'd write the processing bit as a list comprehension:
outp = [transform(ln) for ln in inp
if not to_be_deleted(ln)]
Upvotes: 2