Reputation: 235
Does anyone have experience with reading WebVTT (.vtt) files using PHP?
I'm developing an application in CakePHP where I need to read through a bunch of vtt files and get the start time and associated text.
So as an example of the file:
00:00.999 --> 00:04.999 sentence one 00:04.999 --> 00:07.999 sentence two 00:07.999 --> 00:10.999 third sentence with a line break 00:10.999 --> 00:14.999 a fourth sentence on three lines
I need to be able to extract something like this:
00:00.999 sentence one 00:04.999 sentence two 00:07.999 third sentence with a line break 00:10.999 a fourth sentence on three lines
Note that there can be line breaks so there's no set number of lines between each timestamp.
My plan was to search for "-->" which is a common string between each timestamp. Does anyone have any ideas how best to achieve this?
Upvotes: 1
Views: 4411
Reputation: 4150
To parse file you can use library like this:
$subtitles = Subtitles::loadFromFile('subtitles.vtt');
$blocks = $subtitles->getInternalFormat(); // array
foreach ($blocks as $block) {
echo $block['start'];
echo ' ';
foreach ($block['lines'] as $line) {
echo $line . ' ';
}
echo "\n";
}
It will also get text from files containing styles and other small errors.
https://github.com/mantas-done/subtitles
Upvotes: 2
Reputation: 235
This seems to achieve what I need, i.e. outputs the Start Time and any subsequent lines of text. The files I'm using are fairly small so using PHP's file() function to read everything into an array seems ok; not sure this would work well on large files though.
$file = 'test.vtt';
$file_as_array = file($file, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
foreach ($file_as_array as $f) {
// Find lines containing "-->"
$start_time = false;
if (preg_match("/^(\d{2}:[\d\.]+) --> \d{2}:[\d\.]+$/", $f, $match)) {
$start_time = explode('-->', $f);
$start_time = $start_time[0];
echo '<br>';
echo $start_time;
}
// It's a line of the file that doesn't include a timestamp, so it's caption text. Ignore header of file which includes the word 'WEBVTT'
if (!$start_time && (!strpos($f, 'WEBVTT')) ) {
echo ' ' . $f . ' ';
}
}
}
Upvotes: 1
Reputation: 2691
You can do something like this:
<?PHP
function send_reformatted($vtt_file){
// Add these headers to ease saving the output as text file
header("Content-type: text/plain");
header('Content-Disposition: inline; filename="'.$vtt_file.'.txt"');
$f = fopen($vtt_file, "r");
$line_new = "";
while($line = fgets($f)){
if (preg_match("/^(\d{2}:[\d\.]+) --> \d{2}:[\d\.]+$/", $line, $match)) {
if($line_new) echo $line_new."\n";
$line_new = $match[1];
} else{
$line = trim($line);
if($line) $line_new .= " $line";
}
}
echo $line_new."\n";
fclose($f);
}
send_reformatted("test.vtt");
?>
Upvotes: 0