Reputation: 223
I have the following sentence
review <- C("1a. How long did it take for you to receive a personalized response to an internet or email inquiry made to THIS dealership?: Approx. It was very prompt however. 2f. Consideration of your time and responsiveness to your requests.: Were a little bit pushy but excellent otherwise 2g. Your satisfaction with the process of coming to an agreement on pricing.: Were willing to try to bring the price to a level that was acceptable to me. Please provide any additional comments regarding your recent sales experience.: Abel is awesome! Took care of everything from welcoming me into the dealership to making sure I got the car I wanted (even the color)! ")
I want to remove everything before :
I tried the following code,
gsub("^[^:]+:","",review)
However, it only removed first sentence ending with a colon
Expected results:
Approx. It was very prompt however. Were a little bit pushy but excellent otherwise Were willing to try to bring the price to a level that was acceptable to me. Abel is awesome! Took care of everything from welcoming me into the dealership to making sure I got the car I wanted (even the color)!
Any help or suggestions will be appreciated. Thank you.
Upvotes: 1
Views: 143
Reputation: 626689
If the sentences are not complex and have no abbreviations you may use
gsub("(?:\\d+[a-zA-Z]\\.)?[^.?!:]*[?!.]:\\s*", "", review)
See the regex demo.
Note that you may further generalize it a bit by changing \\d+[a-zA-Z]
to [0-9a-zA-Z]+
/ [[:alnum:]]+
to match 1+ digits or letters.
Details
(?:\d+[a-zA-Z]\.)?
- an optional sequence of
\d+
- 1+ digits[a-zA-Z]
- an ASCII letter\.
- a dot[^.?!:]*
- 0 or more chars other than .
, ?
, !
, :
[?!.]
- a ?
, !
or .
:
- a colon\s*
- 0+ whitespacesR test:
> gsub("(?:\\d+[a-zA-Z]\\.)?[^.?!:]*[?!.]:\\s*", "", review)
[1] "Approx. It was very prompt however. Were a little bit pushy but excellent otherwise Were willing to try to bring the price to a level that was acceptable to me.Abel is awesome! Took care of everything from welcoming me into the dealership to making sure I got the car I wanted (even the color)! "
Extending to handle abbreviations
You may enumerate the exceptions if you add alternation:
gsub("(?:\\d+[a-zA-Z]\\.)?(?:i\\.?e\\.|[^.?!:])*[?!.]:\\s*", "", review)
^^^^^^^^^^^^^^^^^^^^^^
Here, (?:i\.?e\.|[^.?!:])*
matches 0 or more ie.
or i.e.
substrings or any chars other than .
, ?
, !
or :
.
See this demo.
Upvotes: 2