Reputation: 354
I want to separate my sentence(s) into two parts. Because they are made of English letters and non english letters. I have regex I am using in preg_split method to get normal letters and characters. This though, works for opposite and I am left with only Japanese and not english.
String I work with:
すぐに諦めて昼寝をするかも知れない。 I may give up soon and just nap instead.
My attempt:
$parts = preg_split("/[ -~]+$/", $cleanline); // $cleanline is the string above
print_r($parts);
My result
Array ( [0] => すぐに諦めて昼寝をするかも知れない。 [1] => )
As you can see, I do get an empty second value. How can I get both the English and the non-English text into two different strings? Why is the English text not returning even if I use correct regex (from what I've been testing)?
Upvotes: 2
Views: 1078
Reputation: 91430
You could use lookaround to split on boundary between non alphabetic and alphabetic + space
$str = 'すぐに諦めて昼寝をするかも知れない。 I may give up soon and just nap instead.';
$parts = preg_split("/(?<=[^a-z])(?=[a-z\h])|(?<=[a-z\h])(?=[^a-z])/i", $str, 2);
print_r($parts);
Output:
Array
(
[0] => すぐに諦めて昼寝をするかも知れない。
[1] => I may give up soon and just nap instead.
)
Upvotes: 2
Reputation: 6088
If you have two spaces between the two strings as shown in your example, you can split them easily with a simple \s{2}
:
<?php
$s = "すぐに諦めて昼寝をするかも知れない。 I may give up soon and just nap instead.";
$s = preg_split("/\s{2}/", $s);
print_r($s);
?>
Output:
Array
(
[0] => すぐに諦めて昼寝をするかも知れない。
[1] => I may give up soon and just nap instead.
)
Demo: http://ideone.com/uD2W1Q
Upvotes: 2
Reputation: 1571
try mb_split instead of preg_split function.
mb_regex_encoding('UTF-8');
mb_internal_encoding("UTF-8");
$parts = mb_split("/[ -~]+$/", $cleanline);
Upvotes: 2