Ruby script issue

Question

I'm trying to create a ruby script but I'm struggling on how to proceed.

The goal of my script is, from 2 subtitles files, I want to make a 3rd file by combining the time of the first file with the subtitles of the 2nd.

I don't know if I'm clear, basically, my 2 subtitles files are the same, but the version is different, i.e, the time may differ between the same sequences.

Here is the desired result:

# File One : Version KILLERS (English) 

1
00:00:01,874 --> 00:00:05,577
Previously on "12 Monkeys"...

2
00:00:05,625 --> 00:00:07,882
- Did Marion send you?
- Who?

3
00:00:07,938 --> 00:00:09,905
Marion, the boy's mother.

# File Two : Version AMZN (Translated in French)

1
00:00:08,140 --> 00:00:11,850
Précédemment...

2
00:00:12,260 --> 00:00:14,120
- C'est Marion qui vous envoie ?
- Qui ?

3
00:00:14,150 --> 00:00:16,110
Marion, sa mère.

# File Three : Objective -> Getting a KILLERS versions for French subtitles

1
00:00:01,874 --> 00:00:05,577
Précédemment...

2
00:00:05,625 --> 00:00:07,882
- C'est Marion qui vous envoie ?
- Qui ?

3
00:00:07,938 --> 00:00:09,905
Marion, sa mère.

We can assign the variable "time" for the time (00:00:07,938 for instance), and the variable "text" for the text.

Therefore, I want to replace every "time line" in the first file with the time from the second file.

Here is what I tried:

#!/usr/bin/ruby -w

#To run the script : ./script.rb TVshowENG.srt TVshowESP.srt NewTVshow.srt


def time_srt (h, m, s, ms)
   t = h + ":" + m + ":" + s + "," + ms
end

File.open(ARGV[0], 'r') do |file_one|
    File.open(ARGV[1], 'r') do |file_two|
       File.open(ARGV[2], 'w') do |file_out|
           file_one.each_line do |line|
               if line.strip =~ /^(\d{2}):(\d{2}):(\d{2}),(\d{3})\D+(\d{2}):(\d{2}):(\d{2}),(\d{3})$/
                   file_out << time_srt($1, $2, $3, $4) + ' --> ' +
                                time_srt($5, $6, $7, $8) + "
"
               else
                   file_out << line # I want to print line from file_two there
               end
           end
       end
    end
end

edit : Final Script random glitches

1 - Script Running Issue : The script runs, and suddenly stops with this error

./SyncScript.rb:25:in `block (4 levels) in ': invalid byte sequence in UTF-8 (ArgumentError)
    from ./SyncScript.rb:18:in `loop'
    from ./SyncScript.rb:18:in `block (3 levels) in '
    from ./SyncScript.rb:17:in `open'
    from ./SyncScript.rb:17:in `block (2 levels) in '
    from ./SyncScript.rb:16:in `open'
    from ./SyncScript.rb:16:in `block in '
    from ./SyncScript.rb:15:in `open'
    from ./SyncScript.rb:15:in `'

2- Gliches (More details on #)

14    # From 1 to 16, it worked perfectly fine)
00:00:38,535 --> 00:00:40,832
Elle est revenue !

15
00:00:49,746 --> 00:00:51,620
Ce que je vais te dire

16
00:00:51,715 --> 00:00:54,749
est la légende telle
que l'on me l'a racontée.

00:01:01,650 --> 00:01:04,190  # Suddently, there's no sequence number (here, 17)
Il y avait autrefois un serpent

00:01:04,300 --> 00:01:06,870  # Same here (no 18)
qui n'allait
que dans une seule direction.
19   # Yet, there is 19. But no space ('
' issue)
00:01:07,150 --> 00:01:10,740
Toujours en avant, jamais en arrière.
20  # Same...
00:01:11,020 --> 00:01:13,080
Jusqu'au jour où,
21
00:01:13,260 --> 00:01:16,770
le serpent tomba sur un démon.
22
00:01:21,670 --> 00:01:22,730
Stop !
23
00:01:38,050 --> 00:01:40,050
Ho ! Ho !
24
00:01:59,030 --> 00:02:01,090
Je suis sûr que vous ne serez pas
surpris d'apprendre que
25
00:02:01,130 --> 00:02:04,310
nous avons vu votre venue

26
00:02:06,380 --> 00:02:11,020  #No subtitles there ??

27
00:02:11,690 --> 00:02:13,390

River · Accepted Answer

You have a couple problems.

Your regex doesn't match the whole line, you need to add (\D+) to account for the phrase coming after the numbers.
You want to loop through both files concurrently; this is easily accomplished using loop do with gets.

With these two fixes, your code works:

File.open(ARGV[0], 'r') do |file_one|
    File.open(ARGV[1], 'r') do |file_two|
        File.open(ARGV[2], 'w') do |file_out|
            loop do
                line_one = file_one.gets
                line_two = file_two.gets
                break unless line_one and line_two #gets gives nil at EOF
                if line_one.strip =~ /^(\d{2}):(\d{2}):(\d{2}),(\d{3})\D+(\d{2}):(\d{2}):(\d{2}),(\d{3})$/
                    temp = time_srt($1, $2, $3, $4) + ' --> ' +
                        time_srt($5, $6, $7, $8) #take time from line_one
                    if line_two.strip =~ /^(\d{2}):(\d{2}):(\d{2}),(\d{3})\D+(\d{2}):(\d{2}):(\d{2}),(\d{3})$/
                        file_out << temp + "
" + file_two.gets #take phrase from file_two
                        file_one.gets #skip phrase in file_one
                    end
                else
                    file_out << line_two #Keep file_two's lines when not a timestamp
                end
            end
        end
    end
end

Testing with your example files yields:

1
00:00:01,874 --> 00:00:05,577
Précédemment...

2
00:00:05,625 --> 00:00:07,882
- C'est Marion qui vous envoie ?
- Qui ?

3
00:00:07,938 --> 00:00:09,905
Marion, sa mère.

Ruby script issue

Answers (2)

Related Questions