Reputation: 11
I am checking that my subtitle files have the correct formatting. There are 3 common errors I am looking for.
Upvotes: 0
Views: 117
Reputation: 4674
A regex which does all those:
^(.*\[(?![0-9][0-9]:[0-9][0-9]:[0-9][0-9]:[0-9][0-9]\]).*|.*\^(?![BI]).*|([^\^\n]*\^[^B\n])*[^\^\n]*\^B([^\^\n]*\^[^B\n])*[^\^\n]*|([^\^\n]*\^[^I\n])*[^\^\n]*\^I([^\^\n]*\^[^I\n])*[^\^\n]*)$
Just type that into the search bar of a regex enabled text editor and it will find any erroneous lines as defined in your question.
I tested it using the find feature of both Notepad++ (Windows) and TextWrangler (Mac). It should also work with Python, or any other Regex flavor that supports negative lookaheads. When you search, make sure the check box or circle next to "regular expression" or "grep" is checked. Note that this regex will not work with Linux grep, because grep doesn't support lookarounds.
It's definitely not pretty, but it's really just 4 smaller regexes pushed together like ^(rule1|rule2|rule3B|rule3I)$
.
The first rule is:
^.*\[(?![0-9][0-9]:[0-9][0-9]:[0-9][0-9]:[0-9][0-9]\]).*$
which matches any line that has a "[" that isn't part of the [00:00:00:00] pattern, using a negative lookahead.
The second rule is:
^.*\^(?![BI]).*$
which matches any line with a "^" not immediately followed by a B or an I, again using a negative lookahead so that it will match at the end of the line, too.
The third rule is a doozie:
^([^\^\n]*\^[^B\n])*[^\^\n]*\^B([^\^\n]*\^[^B\n])*[^\^\n]*$
which matches any line with exactly one instance of the literal ^B
used for bold. The ([^\^\n]*\^[^B\n])*[^\^\n]*
part matches anything that isn't ^B
, and the \^B
part matches ^B
. I've included \n
to prevent multiline matching in notepad++. You can remove the \n
's if you're using grep or any program already doing a line-by-line regex search.
The fourth rule is just the third rule with "I" instead of "B".
Upvotes: 1