John Doe
John Doe

Reputation: 53

Ruby script issue

I'm trying to create a ruby script but I'm struggling on how to proceed.

The goal of my script is, from 2 subtitles files, I want to make a 3rd file by combining the time of the first file with the subtitles of the 2nd.

I don't know if I'm clear, basically, my 2 subtitles files are the same, but the version is different, i.e, the time may differ between the same sequences.

Here is the desired result:

# File One : Version KILLERS (English) 

1
00:00:01,874 --> 00:00:05,577
<i>Previously on "12 Monkeys"...</i>

2
00:00:05,625 --> 00:00:07,882
- Did Marion send you?
- Who?

3
00:00:07,938 --> 00:00:09,905
Marion, the boy's mother.

# File Two : Version AMZN (Translated in French)

1
00:00:08,140 --> 00:00:11,850
<i>Précédemment...</i>

2
00:00:12,260 --> 00:00:14,120
- C'est Marion qui vous envoie ?
- Qui ?

3
00:00:14,150 --> 00:00:16,110
Marion, sa mère.

# File Three : Objective -> Getting a KILLERS versions for French subtitles

1
00:00:01,874 --> 00:00:05,577
<i>Précédemment...</i>

2
00:00:05,625 --> 00:00:07,882
- C'est Marion qui vous envoie ?
- Qui ?

3
00:00:07,938 --> 00:00:09,905
Marion, sa mère.

We can assign the variable "time" for the time (00:00:07,938 for instance), and the variable "text" for the text.

Therefore, I want to replace every "time line" in the first file with the time from the second file.

Here is what I tried:

#!/usr/bin/ruby -w

#To run the script : ./script.rb TVshowENG.srt TVshowESP.srt NewTVshow.srt


def time_srt (h, m, s, ms)
   t = h + ":" + m + ":" + s + "," + ms
end

File.open(ARGV[0], 'r') do |file_one|
    File.open(ARGV[1], 'r') do |file_two|
       File.open(ARGV[2], 'w') do |file_out|
           file_one.each_line do |line|
               if line.strip =~ /^(\d{2}):(\d{2}):(\d{2}),(\d{3})\D+(\d{2}):(\d{2}):(\d{2}),(\d{3})$/
                   file_out << time_srt($1, $2, $3, $4) + ' --> ' +
                                time_srt($5, $6, $7, $8) + "\n"
               else
                   file_out << line # I want to print line from file_two there
               end
           end
       end
    end
end 

edit : Final Script random glitches

1 - Script Running Issue : The script runs, and suddenly stops with this error

./SyncScript.rb:25:in `block (4 levels) in <main>': invalid byte sequence in UTF-8 (ArgumentError)
    from ./SyncScript.rb:18:in `loop'
    from ./SyncScript.rb:18:in `block (3 levels) in <main>'
    from ./SyncScript.rb:17:in `open'
    from ./SyncScript.rb:17:in `block (2 levels) in <main>'
    from ./SyncScript.rb:16:in `open'
    from ./SyncScript.rb:16:in `block in <main>'
    from ./SyncScript.rb:15:in `open'
    from ./SyncScript.rb:15:in `<main>'

2- Gliches (More details on #)

14    # From 1 to 16, it worked perfectly fine)
00:00:38,535 --> 00:00:40,832
Elle est revenue !

15
00:00:49,746 --> 00:00:51,620
<i>Ce que je vais te dire</i>

16
00:00:51,715 --> 00:00:54,749
<i>est la légende telle
que l'on me l'a racontée.</i>

00:01:01,650 --> 00:01:04,190  # Suddently, there's no sequence number (here, 17)
<i>Il y avait autrefois un serpent</i>

00:01:04,300 --> 00:01:06,870  # Same here (no 18)
<i>qui n'allait
que dans une seule direction.</i>
19   # Yet, there is 19. But no space ('\n' issue)
00:01:07,150 --> 00:01:10,740
<i>Toujours en avant, jamais en arrière.</i>
20  # Same...
00:01:11,020 --> 00:01:13,080
<i>Jusqu'au jour où,</i>
21
00:01:13,260 --> 00:01:16,770
<i>le serpent tomba sur un démon.</i>
22
00:01:21,670 --> 00:01:22,730
Stop !
23
00:01:38,050 --> 00:01:40,050
Ho ! Ho !
24
00:01:59,030 --> 00:02:01,090
Je suis sûr que vous ne serez pas
surpris d'apprendre que
25
00:02:01,130 --> 00:02:04,310
nous avons vu votre venue

26
00:02:06,380 --> 00:02:11,020  #No subtitles there ??

27
00:02:11,690 --> 00:02:13,390

Upvotes: 4

Views: 135

Answers (2)

Cary Swoveland
Cary Swoveland

Reputation: 110685

This is a more Ruby-like way to perform the desired substitutions.

Code

def substitute(french_fname, english_fname)
  r = /^\d{2}:\d{2}:\d{2},\d{3} --> \d{2}:\d{2}:\d{2},\d{3}$/
  english_lines = File.read(ENGLISH_FNAME).scan(r)
  File.read(FRENCH_FNAME).gsub(r) { english_lines.shift }
end

Example

Here is the content of your two files written with Heredocs. (Search for “Here Document” here.

english =<<IGNOMINIOUS_END
1
00:00:01,874 --> 00:00:05,577
<i>Previously on "12 Monkeys"...</i>

2
00:00:05,625 --> 00:00:07,882
- Did Marion send you?
- Who?

3
00:00:07,938 --> 00:00:09,905
Marion, the boy's mother.
IGNOMINIOUS_END

and

french =<<BITTER_END
1
00:00:08,140 --> 00:00:11,850
<i>Précédemment...</i>

2
00:00:12,260 --> 00:00:14,120
- C'est Marion qui vous envoie ?
- Qui ?

3
00:00:14,150 --> 00:00:16,110
Marion, sa mère.
BITTER_END

Now let’s create the two files, which I’ll name “english” and “french”.

ENGLISH_FNAME = "english"
FRENCH_FNAME  = "french"

File.write(ENGLISH_FNAME, english)
  #=> 191
File.write(FRENCH_FNAME, french)
  #=> 182

We may now use the method substitute to compute the contents of the French file revised as desired (a string, which of course could be written to a file).

puts substitute(french_fname, english_fname)
1
00:00:01,874 --> 00:00:05,577
<i>Précédemment...</i>

2
00:00:05,625 --> 00:00:07,882
- C'est Marion qui vous envoie ?
- Qui ?

3
00:00:07,938 --> 00:00:09,905
Marion, sa mère.

Explanation

If the two files were huge one would want to read them line-by-line, but it's reasonable to assume they are quite modest in size (even if the running time of the film were, say, 1,000 hours), so I will gulp them into strings.

english_content = File.read(ENGLISH_FNAME)
french_content  = File.read(FRENCH_FNAME)

See IO#read and recall that File.superclass #=> IO.

Next, let's use String#scan with a regular expression to extract an array of the substitutor lines from english_content.

r = /^\d{2}:\d{2}:\d{2},\d{3} --> \d{2}:\d{2}:\d{2},\d{3}$/

english_lines = english_content.scan(r)
  #=> ["00:00:01,874 --> 00:00:05,577",
  #    "00:00:05,625 --> 00:00:07,882",
  #    "00:00:07,938 --> 00:00:09,905"]

Note "^" and "$" are beginning of line and end-of-line anchors, not to be confused with beginning of string ("\A") and end-of-string ("\z") anchors.

Lastly, we use the method String#gsub with the same regular expression to perform the required substitutions.

french_content.gsub(r) { english_lines.shift }

Upvotes: 1

River
River

Reputation: 9093

You have a couple problems.

  1. Your regex doesn't match the whole line, you need to add (\D+) to account for the phrase coming after the numbers.
  2. You want to loop through both files concurrently; this is easily accomplished using loop do with gets.

With these two fixes, your code works:

File.open(ARGV[0], 'r') do |file_one|
    File.open(ARGV[1], 'r') do |file_two|
        File.open(ARGV[2], 'w') do |file_out|
            loop do
                line_one = file_one.gets
                line_two = file_two.gets
                break unless line_one and line_two #gets gives nil at EOF
                if line_one.strip =~ /^(\d{2}):(\d{2}):(\d{2}),(\d{3})\D+(\d{2}):(\d{2}):(\d{2}),(\d{3})$/
                    temp = time_srt($1, $2, $3, $4) + ' --> ' +
                        time_srt($5, $6, $7, $8) #take time from line_one
                    if line_two.strip =~ /^(\d{2}):(\d{2}):(\d{2}),(\d{3})\D+(\d{2}):(\d{2}):(\d{2}),(\d{3})$/
                        file_out << temp + "\n" + file_two.gets #take phrase from file_two
                        file_one.gets #skip phrase in file_one
                    end
                else
                    file_out << line_two #Keep file_two's lines when not a timestamp
                end
            end
        end
    end
end

Testing with your example files yields:

1
00:00:01,874 --> 00:00:05,577
<i>Précédemment...</i>

2
00:00:05,625 --> 00:00:07,882
- C'est Marion qui vous envoie ?
- Qui ?

3
00:00:07,938 --> 00:00:09,905
Marion, sa mère.

Upvotes: 3

Related Questions