Amod
Amod

Reputation: 79

Regex match with Arabic

i have a text in Arabic and i want to use Regex to extract numbers from it. here is my attempt.

String :

"ما المجموع:

1+2"

Match match = Regex.Match(text, "المجموع: ([^\\r\\n]+)", RegexOptions.IgnoreCase);

it will always return false. and groups.value will always return null.

expected output:

match.Groups[1].Value //returns (1+2)

Upvotes: 3

Views: 220

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627343

The regex you wrote matches a word, then a colon, then a space and then 1 or more chars other than backslash, r and n.

You want to match the whole line after the word, colon and any amount of whitespace chars:

var text = "ما المجموع:\n1+2";
var result = Regex.Match(text, @"المجموع:\s*(.+)")?.Groups[1].Value;
Console.WriteLine(result); // => 1+2

See the C# demo

Other possible patterns:

@"المجموع:\r?\n(.+)" // To match CRLF or LF line ending only
@"المجموع:\n(.+)"    // To match just LF ending only

Also, if you run the regex against a long multiline text with CRLF endings, it makes sense to replace .+ wit [^\r\n]+ since . in a .NET regex matches any chars but newlines, LF, and thus matches CR symbol.

Upvotes: 1

Related Questions