Reputation: 1700
I need to remove the entire sentence from the string if it is containing a pattern. Here I have the pattern "Link" or "link", if it is present in the string, I need to remove the entire sentence containing it.
std::string subject = "This is previous sentence. This can be any sentences. Link 2.1.19.3 [Example]. This is can be any other sentence. This is next sentence.";
std::string removeRedundantString(std::string subject)
{
std::string removeSee = subject;
std::smatch match;
std::regex redundantSee("(Link.*$)");
if (std::regex_search(subject, match, redundantSee))
{
removeSee = std::regex_replace(subject, redundantSee, "");
}
}
Expected Output :
This is previous sentence. This can be any sentences.This is can be any other sentence. This is next sentence.
Actual Output :
This is previous sentence. This can be any sentences.
The above actual output is coming because of regex used "(Link.*$)"
which remove the sentences starting from Link to the end of the string.
I am not able to figure out what regex is used to get the expected output.
Here are the different test cases I need to test :
Testcase 1:
std::string subject = "Note this is second pattern, Ops that next the scheduler; link the amount for the full list of docs. The number of value varies from 0 to 4.";
Output: Note this is second pattern, Ops that next the scheduler;The number of value varies from 0 to 4.
Testcase 2:
std::string subject = "This is another pattern. (Link Doc::78::hello::Core::mount). Since this patern includes non-numeric value.";
Output : This is another pattern.Since this patern includes non-numeric value.
Any help would be appreciated.
Upvotes: 5
Views: 353
Reputation: 153
In the first place, is good for you to keep in mind that the end of a string is not the next .
character, it's the end of memory region that the variable subject
refers to. So, when you match the end of the string, the regex engine will go to the end of that memory.
For instance, we can have string str = ".......................";
and the end of the string will be the last .
character.
What you are trying to do, I suppose, is match the word "link" until the next .
. For this, you should define a charset (consisting of upper and lowercase letters, numbers, spaces, and colon characters, according to your testcases).
A regex resembling this one link([0-9a-z ]*)\.
can be used.
Also, before you use those, I suggest you test your regexes in some places, like RegExr.
Upvotes: 0
Reputation: 627087
I'd recommend
std::regex redundantSee(R"(\W*\b[Ll]ink\b(?:\d+(?:\.\d+)*|[^.])*[.?!])")
See its online demo. Note the raw string literal syntax, R"(...)"
. The string pattern can be simply put inside instead of ...
without any additional escaping.
Regex details:
\W*
- zero or more non-word chars\b
- a word boundary[Ll]ink
- Link
or link
word\b
- a word boundary(?:\d+(?:\.\d+)*|[^.])*
- zero or more sequences of
\d+(?:\.\d+)*
- one or more digits followed with zero or more sequences of .
and one or more digits|
- or[^.]
- any char other than a .
[.?!]
- a ?
, .
or !
.Upvotes: 3