Reputation: 83
I have the following strings:
[
'全新Precision 5530二合一移动工作站',
'15" (5530)',
'新14"灵越燃7000三边微边框',
'灵越新13"(7380)轻薄本 热卖',
'XPS新15"(9570)热卖',
'新15"灵越5000(Intel)',
'12” 二合一 (5290)'
]
I need to eliminate every non-chinese character(like product line name, model), including the ones inside parentheses, but I can't replace the (Intel) too(can be other string inside the parentheses that can't be on the regex match)
For now, I have the following: pattern = /(\w+\s+\d+|\(?\d{4}\)?|[a-z]+)/gi
this, applied to the previous array, returns
[
["Precision 5530"],
["(5530)"],
["7000"],
["(7380)"],
["XPS", "(9570)"],
["5000", "Intel"],
["(5290)"]
]
which is almost perfect, except that "Intel" shouldn't be there..I can't seem to get to the regex that will exclude the Intel(or anything that is common letter inside ())
On regex101: https://regex101.com/r/vqO0BO/2
can anyone help?
Solution: With the regex provided in the answers(getting also the parentheses), and a bit of js, I manage to get the newText from text that I wanted..
newText = text.replace(pattern, function(a, b) {
if(a === b) {
return " ";
} else {
if(a !== undefined) {
return a;
} else if(b !== undefined) {
return b;
} else { //If a and b are undefined, just replace the "undefined" with ""
return "";
}
}
}).trim();
Upvotes: 1
Views: 55
Reputation: 627536
I suggest matching what is inside parentheses, and matching and capturing the rest. Once the capturing group matches some text, the match can be replaced with a space, and if Group 1 did not match, replace with the whole match.
var strs = [
'全新Precision 5530二合一移动工作站',
'15" (5530)',
'新14"灵越燃7000三边微边框',
'灵越新13"(7380)轻薄本 热卖',
'XPS新15"(9570)热卖',
'新15"灵越5000(Intel)',
'12” 二合一 (5290)'
];
var pattern = /\([a-z]+\)|(\w+\s+\d+|\(?\d{4}\)?|[a-z]+)/gi;
for (var s of strs) {
console.log(
s.replace(pattern, function (a, b) {
return b ? " " : a;
}).trim()
);
}
Regex details
\(
- (
[a-z]+
- 1+ letters\)
- a )
|
- or(\w+\s+\d+|\(?\d{4}\)?|[a-z]+)
- Group 1: 1+ word chars, 1+ whitespaces and 1+ digits, or an optional (
, 4 digits and an optional )
, or 1 or more ASCII letters.Upvotes: 1