Reputation: 683
I want to search a table in a postgres DB which contains both Arabic and English text. For example:
id | content
-----------------
1 | دجاج
2 | chicken
3 | دجاج chicken
The result would get me row 3.
I imagine this has to do with limiting characters using regex, but I cannot find a clean solution to select both. I tried:
SELECT regexp_matches(content, '^([x00-\xFF]+[a-zA-Z][x00-\xFF]+)*')
FROM mg.messages;
However, this only matches english and some non english characters within {}.
Upvotes: 5
Views: 2587
Reputation: 479
If you want to find all articles that has at least one Unicode characters from the Arabic range (U+0600 -> U-06FF), you would have to use the following:
SELECT content FROM mg.messages WHERE content ~ E'[\u0600-\u06FF]';
Which would indeed return id 1 (Arabic only), ...you would have to adapt the pattern to match any Arabic character followed or preceded by another ASCII (english?) character.
If you want to search for any other character set (range), here is a list of all the Unicode Blocks (Hebrew, Greek, Cyrillic, Hieroglyphs, Ideographs, dingbats, etc.)
Upvotes: 1
Reputation: 12749
I know nothing about Arabic text or RTL languages in general, but this worked:
create table phrase (
id serial,
phrase text
);
insert into phrase (phrase) values ('apple pie');
insert into phrase (phrase) values ('فطيرة التفاح');
select *
from phrase
where phrase like ('apple%')
or phrase like ('فطيرة%');
http://sqlfiddle.com/#!15/75b29/2
Upvotes: 2