Reputation: 55
I am currently scripting using Google Script. I'm trying to select anything that isn't the characters within the square bracket by writing:
var cleantext = text.replace(/[^\s\w"!,、。\.??!:]/g,'');
I want to also keep "[" and "]" and I have followed some of the tutorials here trying "\\]" and "\\["
var cleantext = text.replace(/[^\s\w"!,、。\.??!:"\\]""\\["]/g,'');
or trying \\] and \\[
var cleantext = text.replace(/[^\s\w"!,、。\.??!:\\]\\[]/g,'');
Please feel free to change how my question is worded, as I am finding that I probably don't know what question I'm actually trying to ask here as there are many similar questions with similar titles already here at Stackoverflow.
I wish to edit a whole column of cells, which are a combination of Japanese, Chinese, and English characters.
For an example: "こんにちは、私はJimです😃 | [Audio.Category:Jim]" would output to: "こんにちは、私はJimです [Audio.Category:Jim]"
Deleting emojis, and other characters not defined by what's within the brackets.
Upvotes: 1
Views: 754
Reputation: 626920
To include ]
and [
into a JavaScript regex character class, you need to escape ]
and you do not have to escape [
:
/[abc[\]xyz]/
^^^
If you need to support ASCII letters and Japanese only, you need to add the Japanese letter ranges:
/[^\s"!,、。.??!:[\][A-Za-z\u3000-\u303F\u3040-\u309F\u30A0-\u30FF\uFF00-\uFFEF\u4E00-\u9FAF\u2605-\u2606\u2190-\u2195\u203B]+/g
Here is a sample solution:
function myFunction() {
var sheet = SpreadsheetApp.getActiveSheet();
var cell = sheet.getRange('F13').getValue();
Logger.log(cell);
var reg_ascii_letter = "A-Za-z";
var reg_japanese_letter = "\\u3000-\\u303F\\u3040-\\u309F\\u30A0-\\u30FF\\uFF00-\\uFFEF\\u4E00-\\u9FAF\\u2605-\\u2606\\u2190-\\u2195\\u203B";
var rx = new RegExp("[^\\s\"!,、。.??!:[\\][" + reg_ascii_letter + reg_japanese_letter + "]+", "g");
Logger.log(rx);
var nval = cell.replace(rx, '').replace(/(\s){2,}/g, '$1');
sheet.getRange('F15').setValue(nval);
}
In a similar way, you may build a Unicode regex for any letter.
Upvotes: 2