Reputation: 2860
I have tried
gsub("/^(http?:\\/\\/)?([\\da-z\\.-]+)\\.([a-z\\.]{2,6})([\\/\\w \\.-]*)*\\/?$/","","This is a website http://www.example.com/test and needs to be removed",ignore.case=T, perl=T)
pattern is from: this website
Code runs but doesnt work. Any ideas?
Upvotes: 0
Views: 63
Reputation: 110024
The rm_url
function from the qdapRegex package that maintain is made for this. It has the added benefit of correcting the extra white space left behind:
library(qdapRegex)
rm_url("This is a website http://www.example.com/test and needs to be removed")
## [1] "This is a website and needs to be removed"
If you're interested in what the regex is for rm_url
you can use the grab
function on any qdapRegex function that uses a single regex and learn about the expression used:
grab("rm_url")
## [1] "(http[^ ]*)|(ftp[^ ]*)|(www\\.[^ ]*)"
Upvotes: 0
Reputation: 68820
Remove:
^
and $
, which match start/end of line/
, which are delimiters, and are not required by gsub
, which avoid you to match the url only -currently, it catch all the end of the line)gsub("(http?:\\/\\/)?([\\da-z\\.-]+)\\.([a-z\\.]{2,6})([\\/\\w\\.-]*)*\\/?","","This is a website http://www.example.com/test and needs to be removed",ignore.case=T, perl=T)
Upvotes: 1