Reputation: 3
I have a .srt file with Ghost in the Shell 2 subtitles and I want to clear every piece of dialog except the citations and the translators references for the citation. So in:
66
00:12:50,035 --> 00:12:54,096
"What's the point of blaming the mirror
if you don't like what you see."
[Trans. Note: He's quoting Nikolai Vasilevich Gogol.]
I want to select just the:
"What's the point of blaming the mirror
if you don't like what you see."
[Trans. Note: He's quoting Nikolai Vasilevich Gogol.]
So far I got this:
("[\s\S]+?"[[\s\S]+?])
But there's a problem with this one, because it selects the pieces of text that are between the "foobar" and the [foobar], like this:
"If our gods and our hopes are nothing but scientific phenomena,
then it must be said that our love is scientific as well"
2
00:01:05,732 --> 00:01:08,098
Repo-202 calling air traffic control.
3
00:01:08,201 --> 00:01:09,725
We've arrived over the site.
[The kanji means "Look"]
I just want to select "citation"[note] when they are together.
Upvotes: 0
Views: 173
Reputation: 41838
Here is a way to remove the bad lines in Perl or PCRE regex. For instance, you can do this in Notepad++, which uses PCRE. The demo shows you that the bad lines are selected.
(?m)^\s*(?:(\[(?:[^][]++|(?1))*\])|(?<!\\)"(?:\\"|[^"])*+")(*SKIP)(*F)|.*
Basically, the expression on the left of the main |
alternation operator matches all full brackets and double-quoted strings, then deliberately fails and skips to the next position in the string. This leaves the .*
at the end free to match the remaining lines, which are the ones you want to replace.
For details of how this works, see this question about Matching (or replacing) a pattern, excluding.....
Upvotes: 0
Reputation: 726809
I just want to select "citation"[note] when they are together.
However, they are not together in your case: there is a line break separator between the quote and the square bracket. You need to modify your expression to account for that. Of course you also need to escape your square brackets.
In addition, you should replace reluctantly qualified expressions for the content [\s\S]+?
with expressions that prevent backtracking, like this:
("[^"]+"\s\[[^\]]+\])
Finally, you need to turn on the "multiline" option of your regex engine. This is specific to your regex environment - in Java, you use MULTILINE
mode; in .NET it's RegexOptions.Multiline
, and so on.
Upvotes: 1