Reputation: 107
CompanyName Desired Output
Abbey Company.Com abbey company
Manisd Company .com manisd company
Idely.com idely
How can i remove .com,while taking care that "com" from company is not effected. I've tried the below code
stopwords = c("limited"," l.c.", " llc","corporation"," &"," ltd.","llp ",
"l.l.c","incorporated","association","s.p.a"," l.p.","l.l.l.p","p.a ","p.c ",
"chtd ","chtd. ","r.l.l.l.p ","rlllp ", "the "," lmft", " inc.", ".com")
file_new1$CompanyName<-gsub(paste0(stopwords,collapse = "|"),"", file_new1$CompanyName)
already refereed to this link
Remove certain words in string from column in dataframe in R
Upvotes: 2
Views: 152
Reputation: 4482
You can do gsub("\\.Com","",dt$CompanyName)
. Assuming that your data.table
is called dt
UPDATE
Another solution might be to keep only the "stuff" before the dot (".").
So
CompanyName <- data.table(V1=c("Abbey Company.Com", "Manisd Company .com", "Idely.com"))
> CompanyName
V1
1: Abbey Company.Com
2: Manisd Company .com
3: Idely.com
CompanyName$V1 <- sel_strsplit(CompanyName$V1,"\\.",1)
> CompanyName
V1
1: Abbey Company
2: Manisd Company
3: Idely
That way you don't have to care if you have ".com", or ".COM", or ".co.uk" etc
Upvotes: 3
Reputation: 24480
If you have:
CompanyName <- c("Abbey Company.Com", "Manisd Company .com", "Idely.com")
You could try:
gsub(paste0(gsub("\\.","\\\\.",stopwords),collapse = "|"),"",
tolower(CompanyName))
#[1] "abbey company" "manisd company " "idely"
Upvotes: 4