Reputation: 105537
I have a textGrid file generated by Prosodylab-Aligner which I can open in Praat
. Is there any possibility to get out of it a text file that looks like that:
Word in text | Pronounciation started at
Hello 0:0:0.000
my 0:0:1.125
friends 0:0:2.750
EDIT
Attached textGrid file:
File type = "ooTextFile"
Object class = "TextGrid"
xmin = 0.0
xmax = 2.53
tiers? <exists>
size = 2
item []:
item [1]:
class = "IntervalTier"
name = "phones"
xmin = 0.0
xmax = 2.53
intervals: size = 13
intervals [1]:
xmin = 0.0
xmax = 0.62
text = "sil"
intervals [2]:
xmin = 0.62
xmax = 0.78
text = "K"
intervals [3]:
xmin = 0.78
xmax = 0.81
text = "L"
intervals [4]:
xmin = 0.81
xmax = 0.92
text = "IH1"
intervals [5]:
xmin = 0.92
xmax = 1.02
text = "K"
intervals [6]:
xmin = 1.02
xmax = 1.07
text = ""
intervals [7]:
xmin = 1.07
xmax = 1.22
text = "T"
intervals [8]:
xmin = 1.22
xmax = 1.31
text = "UW1"
intervals [9]:
xmin = 1.31
xmax = 1.51
text = "S"
intervals [10]:
xmin = 1.51
xmax = 1.67
text = "T"
intervals [11]:
xmin = 1.67
xmax = 1.85
text = "AA1"
intervals [12]:
xmin = 1.85
xmax = 1.88
text = "P"
intervals [13]:
xmin = 1.88
xmax = 2.53
text = "sil"
item [2]:
class = "IntervalTier"
name = "words"
xmin = 0.0
xmax = 2.53
intervals: size = 6
intervals [1]:
xmin = 0.0
xmax = 0.62
text = "sil"
intervals [2]:
xmin = 0.62
xmax = 1.02
text = "CLICK"
intervals [3]:
xmin = 1.02
xmax = 1.07
text = "sp"
intervals [4]:
xmin = 1.07
xmax = 1.31
text = "TO"
intervals [5]:
xmin = 1.31
xmax = 1.88
text = "STOP"
intervals [6]:
xmin = 1.88
xmax = 2.53
text = "sil"
Upvotes: 1
Views: 2744
Reputation: 2098
Since this is a Praat
file, and you say you can open it in Praat
, I thought a better solution would be to use Praat
to solve it. A script like the following involves a lot fewer leaps of faith:
form Parse TextGrid...
sentence File /path/to/your.TextGrid
integer Tier 2
endform
Read from file: file$
intervals = Get number of intervals: tier
writeInfoLine: "Word in text", tab$, "Pronounciation started at"
for i to intervals
label$ = Get label of interval: tier, i
if label$ != ""
start = Get start point: tier, i
appendInfoLine: label$, tab$, string$(start)
endif
endfor
If you save that into a script somewhere, you could then call Praat
from the command line like praat /path/to/your/script.praat "/path/to/your.TextGrid" 2
and get the desired output from stdout
.
You could also run it manually, and maybe use this to write your file.
Upvotes: 0
Reputation: 2306
The syntax of TextGrid files is a little bit odd. For your restricted purpose, a list of the words and their starting points, your parser could be quite simple:
Find the text line containing 8 spaces and the string 'name = "words"'
Inspect all following lines and stop at the next occurence of 8 spaces and the string 'name = "'
2a. Save the floating point numbers immediately following 12 spaces and the string 'xmin = '
2b. Save the strings immediately following 12 spaces and the string 'text = '
The result of this procedure would be:
0.0 0.62 1.02 1.07 1.31 1.88
"sil" "CLICK" "sp" "TO" "STOP" "sil"
Now just re-order these two arrays and you will have your table (the numbers are the starting points given in seconds).
Keep in mind that "sil" is an abbreviation for the meta tag "silence" and "sp" for "speech pause". While the silence at the beginning and end of an utterance is expected, the speech pause might be wrong because the plosive /t/ of the word "TO" starts with an articulatory occlusion, which is pretty similar to a speech pause, but part of the plosive.
Upvotes: 1