Tanoro
Tanoro

Reputation: 869

Regex match character and non-ascii characters

I am writing a script to clean up a file line-by-line with non-ascii characters, but I am having trouble with a regex pattern. I need a regex pattern that matches any line that starts with an asterisk, may have an equals, and will contain non-ascii characters and spaces. I know how to match a non-ascii character, but not in the same set as other positively defined characters.

Here is a sample line that I need to match:

* = Ìÿð ÿð

Here is the pattern I have so far:

/\*[^[:ascii:]]+[\r\n]/

This will match lines that start with asterisk and containing non-ascii characters, but not if the line has spaces or equals in it.

Upvotes: 1

Views: 1784

Answers (2)

user557597
user557597

Reputation:

Maybe this - (edit: changed after reread )

 # ^\*(?=.*[^\0-\177])

 ^ 
 \*
 (?= .* [^\0-\177] )

Upvotes: 0

Ibrahim Najjar
Ibrahim Najjar

Reputation: 19423

Try the following expression:

^\*\s*=?\s*[[:^ascii:]\s]+[\r\n]*$

This matches the start-of-line ^, then it matches zero or more spaces \s* followed by an optional equal sign =? then zero or more white spaces \s*.

Now a nice piece of expression matches one or more characters which are a combination of non-ascii and white spaces [[:^ascii:]\s]+, check docs to see the syntax for character classes.

Finally the expression matches a combination of carriage returns and newlines which may end the line.

Regex101 Demo

Upvotes: 3

Related Questions