A.Info
A.Info

Reputation: 107

Remove words from string by not affecting the other name that has that word in it

CompanyName            Desired Output
Abbey Company.Com      abbey company
Manisd Company .com    manisd company
Idely.com              idely

How can i remove .com,while taking care that "com" from company is not effected. I've tried the below code

     stopwords = c("limited"," l.c.", " llc","corporation"," &"," ltd.","llp ",
                      "l.l.c","incorporated","association","s.p.a"," l.p.","l.l.l.p","p.a  ","p.c  ",
                      "chtd  ","chtd.  ","r.l.l.l.p  ","rlllp  ", "the "," lmft", " inc.", ".com")

   file_new1$CompanyName<-gsub(paste0(stopwords,collapse = "|"),"", file_new1$CompanyName)

already refereed to this link

Remove certain words in string from column in dataframe in R

Upvotes: 2

Views: 152

Answers (2)

quant
quant

Reputation: 4482

You can do gsub("\\.Com","",dt$CompanyName). Assuming that your data.table is called dt

UPDATE

Another solution might be to keep only the "stuff" before the dot (".").

So

CompanyName <- data.table(V1=c("Abbey Company.Com", "Manisd Company .com", "Idely.com"))

> CompanyName
                    V1
1:   Abbey Company.Com
2: Manisd Company .com
3:           Idely.com

CompanyName$V1 <- sel_strsplit(CompanyName$V1,"\\.",1)
> CompanyName
                V1
1:   Abbey Company
2: Manisd Company 
3:           Idely

That way you don't have to care if you have ".com", or ".COM", or ".co.uk" etc

Upvotes: 3

nicola
nicola

Reputation: 24480

If you have:

CompanyName <- c("Abbey Company.Com", "Manisd Company .com", "Idely.com")

You could try:

gsub(paste0(gsub("\\.","\\\\.",stopwords),collapse = "|"),"",
     tolower(CompanyName))
#[1] "abbey company"   "manisd company " "idely"

Upvotes: 4

Related Questions