Reputation: 53
I'm trying to create a ruby script but I'm struggling on how to proceed.
The goal of my script is, from 2 subtitles files, I want to make a 3rd file by combining the time of the first file with the subtitles of the 2nd.
I don't know if I'm clear, basically, my 2 subtitles files are the same, but the version is different, i.e, the time may differ between the same sequences.
Here is the desired result:
# File One : Version KILLERS (English)
1
00:00:01,874 --> 00:00:05,577
<i>Previously on "12 Monkeys"...</i>
2
00:00:05,625 --> 00:00:07,882
- Did Marion send you?
- Who?
3
00:00:07,938 --> 00:00:09,905
Marion, the boy's mother.
# File Two : Version AMZN (Translated in French)
1
00:00:08,140 --> 00:00:11,850
<i>Précédemment...</i>
2
00:00:12,260 --> 00:00:14,120
- C'est Marion qui vous envoie ?
- Qui ?
3
00:00:14,150 --> 00:00:16,110
Marion, sa mère.
# File Three : Objective -> Getting a KILLERS versions for French subtitles
1
00:00:01,874 --> 00:00:05,577
<i>Précédemment...</i>
2
00:00:05,625 --> 00:00:07,882
- C'est Marion qui vous envoie ?
- Qui ?
3
00:00:07,938 --> 00:00:09,905
Marion, sa mère.
We can assign the variable "time" for the time (00:00:07,938
for instance), and the variable "text" for the text.
Therefore, I want to replace every "time line" in the first file with the time from the second file.
Here is what I tried:
#!/usr/bin/ruby -w
#To run the script : ./script.rb TVshowENG.srt TVshowESP.srt NewTVshow.srt
def time_srt (h, m, s, ms)
t = h + ":" + m + ":" + s + "," + ms
end
File.open(ARGV[0], 'r') do |file_one|
File.open(ARGV[1], 'r') do |file_two|
File.open(ARGV[2], 'w') do |file_out|
file_one.each_line do |line|
if line.strip =~ /^(\d{2}):(\d{2}):(\d{2}),(\d{3})\D+(\d{2}):(\d{2}):(\d{2}),(\d{3})$/
file_out << time_srt($1, $2, $3, $4) + ' --> ' +
time_srt($5, $6, $7, $8) + "\n"
else
file_out << line # I want to print line from file_two there
end
end
end
end
end
edit : Final Script random glitches
1 - Script Running Issue : The script runs, and suddenly stops with this error
./SyncScript.rb:25:in `block (4 levels) in <main>': invalid byte sequence in UTF-8 (ArgumentError)
from ./SyncScript.rb:18:in `loop'
from ./SyncScript.rb:18:in `block (3 levels) in <main>'
from ./SyncScript.rb:17:in `open'
from ./SyncScript.rb:17:in `block (2 levels) in <main>'
from ./SyncScript.rb:16:in `open'
from ./SyncScript.rb:16:in `block in <main>'
from ./SyncScript.rb:15:in `open'
from ./SyncScript.rb:15:in `<main>'
2- Gliches (More details on #)
14 # From 1 to 16, it worked perfectly fine)
00:00:38,535 --> 00:00:40,832
Elle est revenue !
15
00:00:49,746 --> 00:00:51,620
<i>Ce que je vais te dire</i>
16
00:00:51,715 --> 00:00:54,749
<i>est la légende telle
que l'on me l'a racontée.</i>
00:01:01,650 --> 00:01:04,190 # Suddently, there's no sequence number (here, 17)
<i>Il y avait autrefois un serpent</i>
00:01:04,300 --> 00:01:06,870 # Same here (no 18)
<i>qui n'allait
que dans une seule direction.</i>
19 # Yet, there is 19. But no space ('\n' issue)
00:01:07,150 --> 00:01:10,740
<i>Toujours en avant, jamais en arrière.</i>
20 # Same...
00:01:11,020 --> 00:01:13,080
<i>Jusqu'au jour où,</i>
21
00:01:13,260 --> 00:01:16,770
<i>le serpent tomba sur un démon.</i>
22
00:01:21,670 --> 00:01:22,730
Stop !
23
00:01:38,050 --> 00:01:40,050
Ho ! Ho !
24
00:01:59,030 --> 00:02:01,090
Je suis sûr que vous ne serez pas
surpris d'apprendre que
25
00:02:01,130 --> 00:02:04,310
nous avons vu votre venue
26
00:02:06,380 --> 00:02:11,020 #No subtitles there ??
27
00:02:11,690 --> 00:02:13,390
Upvotes: 4
Views: 135
Reputation: 110685
This is a more Ruby-like way to perform the desired substitutions.
Code
def substitute(french_fname, english_fname)
r = /^\d{2}:\d{2}:\d{2},\d{3} --> \d{2}:\d{2}:\d{2},\d{3}$/
english_lines = File.read(ENGLISH_FNAME).scan(r)
File.read(FRENCH_FNAME).gsub(r) { english_lines.shift }
end
Example
Here is the content of your two files written with Heredocs. (Search for “Here Document” here.
english =<<IGNOMINIOUS_END
1
00:00:01,874 --> 00:00:05,577
<i>Previously on "12 Monkeys"...</i>
2
00:00:05,625 --> 00:00:07,882
- Did Marion send you?
- Who?
3
00:00:07,938 --> 00:00:09,905
Marion, the boy's mother.
IGNOMINIOUS_END
and
french =<<BITTER_END
1
00:00:08,140 --> 00:00:11,850
<i>Précédemment...</i>
2
00:00:12,260 --> 00:00:14,120
- C'est Marion qui vous envoie ?
- Qui ?
3
00:00:14,150 --> 00:00:16,110
Marion, sa mère.
BITTER_END
Now let’s create the two files, which I’ll name “english” and “french”.
ENGLISH_FNAME = "english"
FRENCH_FNAME = "french"
File.write(ENGLISH_FNAME, english)
#=> 191
File.write(FRENCH_FNAME, french)
#=> 182
We may now use the method substitute
to compute the contents of the French file revised as desired (a string, which of course could be written to a file).
puts substitute(french_fname, english_fname)
1
00:00:01,874 --> 00:00:05,577
<i>Précédemment...</i>
2
00:00:05,625 --> 00:00:07,882
- C'est Marion qui vous envoie ?
- Qui ?
3
00:00:07,938 --> 00:00:09,905
Marion, sa mère.
Explanation
If the two files were huge one would want to read them line-by-line, but it's reasonable to assume they are quite modest in size (even if the running time of the film were, say, 1,000 hours), so I will gulp them into strings.
english_content = File.read(ENGLISH_FNAME)
french_content = File.read(FRENCH_FNAME)
See IO#read and recall that File.superclass #=> IO
.
Next, let's use String#scan with a regular expression to extract an array of the substitutor lines from english_content
.
r = /^\d{2}:\d{2}:\d{2},\d{3} --> \d{2}:\d{2}:\d{2},\d{3}$/
english_lines = english_content.scan(r)
#=> ["00:00:01,874 --> 00:00:05,577",
# "00:00:05,625 --> 00:00:07,882",
# "00:00:07,938 --> 00:00:09,905"]
Note "^"
and "$"
are beginning of line and end-of-line anchors, not to be confused with beginning of string ("\A"
) and end-of-string ("\z"
) anchors.
Lastly, we use the method String#gsub with the same regular expression to perform the required substitutions.
french_content.gsub(r) { english_lines.shift }
Upvotes: 1
Reputation: 9093
You have a couple problems.
(\D+)
to account for the phrase coming after the numbers.loop do
with gets
.With these two fixes, your code works:
File.open(ARGV[0], 'r') do |file_one|
File.open(ARGV[1], 'r') do |file_two|
File.open(ARGV[2], 'w') do |file_out|
loop do
line_one = file_one.gets
line_two = file_two.gets
break unless line_one and line_two #gets gives nil at EOF
if line_one.strip =~ /^(\d{2}):(\d{2}):(\d{2}),(\d{3})\D+(\d{2}):(\d{2}):(\d{2}),(\d{3})$/
temp = time_srt($1, $2, $3, $4) + ' --> ' +
time_srt($5, $6, $7, $8) #take time from line_one
if line_two.strip =~ /^(\d{2}):(\d{2}):(\d{2}),(\d{3})\D+(\d{2}):(\d{2}):(\d{2}),(\d{3})$/
file_out << temp + "\n" + file_two.gets #take phrase from file_two
file_one.gets #skip phrase in file_one
end
else
file_out << line_two #Keep file_two's lines when not a timestamp
end
end
end
end
end
Testing with your example files yields:
1
00:00:01,874 --> 00:00:05,577
<i>Précédemment...</i>
2
00:00:05,625 --> 00:00:07,882
- C'est Marion qui vous envoie ?
- Qui ?
3
00:00:07,938 --> 00:00:09,905
Marion, sa mère.
Upvotes: 3