Reputation: 71
I have created a regular expression Regex for string that starts from " and ends with " eg: "mynameis"
"\"(?:[^\"\\]|\\.)*\""
Now I want that this expression must not take {we, us, they, and} words. How do I do that? For instance if I input "mynameisalexand" Compiler must ignore {and} and take this string as "mynameisalex"
Upvotes: 1
Views: 581
Reputation: 627410
Since there is no way to match non-continuous text with regex, you can still use your regex or an unrolled one:
"[^"\\]*(?:\\.[^"\\]*)*"
See the regex demo
and remove the substrings you defined with a mere String.Replace
(or with a regex like we|and|...
).
See the C# demo:
var input = "\"mynamesarealexandandrew\" \"mynameisalexand\"";
var regex = new Regex(@"""[^""\\]*(?:\\.[^""\\]*)*""", RegexOptions.IgnorePatternWhitespace);
var results = regex.Matches(input).Cast<Match>()
.Select(p => p.Value.Replace("we", "")
.Replace("us", "")
.Replace("they", "")
.Replace("and", ""))
.ToList();
foreach (var s in results) // DEMO
{
Console.WriteLine(s);
}
Upvotes: 1
Reputation: 21490
You'll need to clean the string up afterwards; regex just isn't powerful enough.
In fact, what you've got is a context-free grammar! If we call your acceptable tokens an 'id', then you've defined a language that looks like this;
id (('and'|'we'|'us') id?)*
That is, at least one id; then the words and
, we
, or us
, then another possible id, maybe. The whole thing then repeats, allowing you to match
mynameisandrewbutheyarebothcalledsarah
as id: mynameis 'and' id: rewbut 'they' id: arebothcalledsarah
So, this is what's known as a context-free language, and regex can't parse that kind of thing. Your best bet is to split on the unacceptable words and just stitch them together at the end.
Upvotes: 0