Reputation: 167
I have this string:
string str = "לא קיימת תוכנה לשליחת מיילים במכשיר, אנא פנה אלינו ישירות ל [email protected]";
and I'm trying to split it the following way:
string[0] = "לא קיימת תוכנה לשליחת מיילים במכשיר, אנא פנה אלינו ישירות ל "
string[1] = "[email protected]"
I'm using this split method:
string[] split = Regex.Split(str, @"^[א-ת]+$");
I want to split between Hebrew and English words, but if the last word is the same as the current add it to the last
But I can not make it work, what am I doing wrong?
Thanks
Upvotes: 1
Views: 1008
Reputation: 138037
Here's one approach:
[\p{IsHebrew}\P{L}]+|\P{IsHebrew}+
Use this pattern with Regex.Matches
:
var matches = Regex.Matches(input, @"[\p{IsHebrew}\P{L}]+|\P{IsHebrew}+");
The pattern has two parts. It either matches:
[\p{IsHebrew}\P{L}]+
- a block containing Hebrew characters and non-letters, OR
\P{IsHebrew}+
- a block of non-Hebrew characters (including non-Hebrew letters and other non-letter characters).We're using Unicode Named Blocks like \p{IsHebrew}
and \p{IsBasicLatin}
.
A similar option is [\p{IsHebrew}\P{L}]+|[\p{IsBasicLatin}\P{L}]+
- is matches specifically a block with Latin (English) letters.
Working example: regex storm, C# example
Upvotes: 2
Reputation: 1915
why not simply use \p{IsHebrew}
?
something like this
string str = "לא קיימת תוכנה לשליחת מיילים במכשיר, אנא פנה אלינו ישירות ל [email protected]";
string pattern = @"[\p{IsHebrew}]+";
var hebrewMatchCollection = Regex.Matches(str, pattern);
string hebrewPart = string.Join(" ", hebrewMatchCollection.Cast<Match>().Select(m => m.Value)); //combine regex collection
var englishPart = Regex.Split(str, pattern).Last();
Upvotes: 0
Reputation: 131492
The pattern in Regex.Split
matches the delimiter and isn't included in the results. Looks like you want to split between the last Hebrew and first non-Hebrew character, eg :
Regex.Split(str,@"\p{IsHebrew} \P{IsHebrew}")
\p{}
captures a character that belongs to a specific Unicode character class or named block while \P{}
excludes it.
Unfortunately, this pattern will exclude the last Hebrew and first non-Hebrew character and return :
לא קיימת תוכנה לשליחת מיילים במכשיר, אנא פנה אלינו ישירות
[email protected]
Capture groups are used to include characters captured by a delimiter pattern in the results. Simply using a group though with (\p{IsHebrew}) (\P{IsHebrew})
will return each capture group as a separate result :
לא קיימת תוכנה לשליחת מיילים במכשיר, אנא פנה אלינו ישירות
ל
m
[email protected]
Vladi Pavelka's use of forward and back references fixes this and (?<=\p{IsHebrew}) (?=\P{IsHebrew})
will return the expected results :
Regex.Split(str,@"(?<=\p{IsHebrew}) (?=\P{IsHebrew})")
will return :
לא קיימת תוכנה לשליחת מיילים במכשיר, אנא פנה אלינו ישירות ל
[email protected]
Upvotes: 0
Reputation: 926
Try this:
string[] split = Regex.Split(str, @"(?<=[א-ת]+) (?=[A-z]+)")
?<=
- lookbehind - Asserts what immediately PRECEDES the current position
?=
- lookahead - Asserts what immediately FOLLOWS the current position
This will resolve the string "splitter" as the place between Hebrew and Latin characters
Upvotes: 1
Reputation: 1302
From your input string, we can consider that we can split the string to Hebrew and an email address in the end of the string.
Then the regex can be( just example):
\w*@gmail.com$
You can test the regex here: https://regexr.com/
Upvotes: 0
Reputation: 7204
Why don't you think differently? The question here is: How to get the emails from the text.
There is a lot of posts for this question.
For example, this
public static void emas(string text)
{
const string MatchEmailPattern =
@"(([\w-]+\.)+[\w-]+|([a-zA-Z]{1}|[\w-]{2,}))@"
+ @"((([0-1]?[0-9]{1,2}|25[0-5]|2[0-4][0-9])\.([0-1]?[0-9]{1,2}|25[0-5]|2[0-4][0-9])\."
+ @"([0-1]?[0-9]{1,2}|25[0-5]|2[0-4][0-9])\.([0-1]?[0-9]{1,2}|25[0-5]|2[0-4][0-9])){1}|"
+ @"([a-zA-Z]+[\w-]+\.)+[a-zA-Z]{2,4})";
Regex rx = new Regex(MatchEmailPattern, RegexOptions.Compiled | RegexOptions.IgnoreCase);
// Find matches.
MatchCollection matches = rx.Matches(text);
// Report the number of matches found.
int noOfMatches = matches.Count;
// Report on each match.
foreach (Match match in matches)
{
Console.WriteLine(match.Value.ToString());
}
}
Upvotes: 0