Reputation: 23
I have .srt files which are in the following format:
0
1
00:00:01,830 --> 00:00:04,740
corresponding text
1
2
00:00:05,280 --> 00:00:10,280
corresponding text
2
3
00:00:10,740 --> 00:00:14,640
corresponding text
3
4
00:00:15,510 --> 00:00:19,260
corresponding text
4
and that extra line with the line number is all the way through the subtitle (line 5, line 6...line 540).
I tried the command sed '/^[0-9]/ s/.//'
and as expected it replaces all the numbers, but I don't know how to make it replace only the second occurrence of each number in the range.
The expected result is:
0
1
00:00:01,830 --> 00:00:04,740
corresponding text
2
00:00:05,280 --> 00:00:10,280
corresponding text
3
00:00:10,740 --> 00:00:14,640
corresponding text
4
00:00:15,510 --> 00:00:19,260
corresponding text
How can I achieve it either with sed, awk or any tool that can do it in batches since there are several files with the same situation?
Thanks!
Upvotes: 1
Views: 217
Reputation: 67507
direct translation of your description. Remove the duplicate number appearing standalone of the line. Print if not integer, otherwise print only the first instance.
$ awk 'int($0)!=$0 || !a[$0]++' file
0
1
00:00:01,830 --> 00:00:04,740
corresponding text
2
00:00:05,280 --> 00:00:10,280
corresponding text
3
00:00:10,740 --> 00:00:14,640
corresponding text
4
00:00:15,510 --> 00:00:19,260
corresponding text
Upvotes: 2
Reputation: 204154
$ awk 'BEGIN{FS=OFS=RS;RS=""} {$NF=""}1' file
0
1
00:00:01,830 --> 00:00:04,740
corresponding text
2
00:00:05,280 --> 00:00:10,280
corresponding text
3
00:00:10,740 --> 00:00:14,640
corresponding text
4
00:00:15,510 --> 00:00:19,260
corresponding text
Upvotes: 4
Reputation: 781833
Using awk
, you can set a variable whenever the line contains one field. If it does, use a variable to hold the last value of that field, and skip printing the line when they match.
awk 'NF == 1 {if (num != "" && $0 == num) next; else num = $0} 1'
Upvotes: 2