Reputation: 23

SED Replace multiple second occurrence of a character

I have .srt files which are in the following format:

0
1
00:00:01,830 --> 00:00:04,740
corresponding text
1

2
00:00:05,280 --> 00:00:10,280
corresponding text
2

3
00:00:10,740 --> 00:00:14,640
corresponding text
3

4
00:00:15,510 --> 00:00:19,260
corresponding text
4

and that extra line with the line number is all the way through the subtitle (line 5, line 6...line 540). I tried the command sed '/^[0-9]/ s/.//' and as expected it replaces all the numbers, but I don't know how to make it replace only the second occurrence of each number in the range.

The expected result is:

0
1
00:00:01,830 --> 00:00:04,740
corresponding text

2
00:00:05,280 --> 00:00:10,280
corresponding text

3
00:00:10,740 --> 00:00:14,640
corresponding text

4
00:00:15,510 --> 00:00:19,260
corresponding text

How can I achieve it either with sed, awk or any tool that can do it in batches since there are several files with the same situation?

Thanks!

Upvotes: 1

Answers (3)

karakfa

Reputation: 67507

direct translation of your description. Remove the duplicate number appearing standalone of the line. Print if not integer, otherwise print only the first instance.

$ awk 'int($0)!=$0 || !a[$0]++' file

0
1
00:00:01,830 --> 00:00:04,740
corresponding text

2
00:00:05,280 --> 00:00:10,280
corresponding text

3
00:00:10,740 --> 00:00:14,640
corresponding text

4
00:00:15,510 --> 00:00:19,260
corresponding text

Upvotes: 2

Ed Morton

Reputation: 204154

$ awk 'BEGIN{FS=OFS=RS;RS=""} {$NF=""}1' file
0
1
00:00:01,830 --> 00:00:04,740
corresponding text

2
00:00:05,280 --> 00:00:10,280
corresponding text

3
00:00:10,740 --> 00:00:14,640
corresponding text

4
00:00:15,510 --> 00:00:19,260
corresponding text

Upvotes: 4

Barmar

Reputation: 781833

Using awk, you can set a variable whenever the line contains one field. If it does, use a variable to hold the last value of that field, and skip printing the line when they match.

awk 'NF == 1 {if (num != "" && $0 == num) next; else num = $0} 1'

Upvotes: 2

SED Replace multiple second occurrence of a character

Answers (3)

Related Questions