Robert Li
Robert Li

Reputation: 107

BigQuery Find Arabic characters

Is there a way to find all of the rows that contain arabic characters?

I have a large data set of names and I would like to get all of the arabic names and treat the text file differently then the rest of my data set.

The only potential possibility that I have read is to upload a table containing all of the arabic characters and somehow do a JOIN/match. However I'd like to avoid this given my lack of knowledge of the arabic language.

Upvotes: 2

Views: 1031

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173028

Hope you will enjoy below and apply to whatever logic you have to implement

SELECT 
  v,
  IFNULL(REGEXP_EXTRACT(v, r'([\p{Cyrillic}]+)'), '') AS russian,
  IFNULL(REGEXP_EXTRACT(v, r'([\p{Arabic}]+)'), '') AS arabic,
  IFNULL(REGEXP_EXTRACT(v, r'([\p{Hebrew}]+)'), '') AS hebrew
FROM 
  (SELECT '12 - Table - Таблица' AS v),
  (SELECT '23 - Table - الطاولة' AS v),
  (SELECT '34 - Table - שולחן' AS v)

Result is

v                       russian     arabic      hebrew   
12 - Table - Таблица    Таблица          
23 - Table - الطاولة               الطاولة       
34 - Table - שולחן                              שולחן    

Upvotes: 4

Related Questions