Reputation: 869
I am writing a script to clean up a file line-by-line with non-ascii characters, but I am having trouble with a regex pattern. I need a regex pattern that matches any line that starts with an asterisk, may have an equals, and will contain non-ascii characters and spaces. I know how to match a non-ascii character, but not in the same set as other positively defined characters.
Here is a sample line that I need to match:
* = Ìÿð ÿð
Here is the pattern I have so far:
/\*[^[:ascii:]]+[\r\n]/
This will match lines that start with asterisk and containing non-ascii characters, but not if the line has spaces or equals in it.
Upvotes: 1
Views: 1784
Reputation:
Maybe this - (edit: changed after reread )
# ^\*(?=.*[^\0-\177])
^
\*
(?= .* [^\0-\177] )
Upvotes: 0
Reputation: 19423
Try the following expression:
^\*\s*=?\s*[[:^ascii:]\s]+[\r\n]*$
This matches the start-of-line ^
, then it matches zero or more spaces \s*
followed by an optional equal sign =?
then zero or more white spaces \s*
.
Now a nice piece of expression matches one or more characters which are a combination of non-ascii and white spaces [[:^ascii:]\s]+
, check docs to see the syntax for character classes.
Finally the expression matches a combination of carriage returns and newlines which may end the line.
Upvotes: 3