ratnesh
ratnesh

Reputation: 569

replacing stopword from string java, how to handle first character

Hey I am doing a project in which I have to remove the stopwords (or rather certain words, i have a list of about 560 of them) from tweets,I was using below code :

tweet= tweet.replaceAll(' '+stopword+' ', "");

But here is problem as first word can also be stopword, so how to handle if first word of the tweet is a stopword, if u are thinking

text = text.replaceAll(stopword+' ', "");

Then this wont work because some stopwords are ending characters of a word, so please give a solution for these. Thanks in advance

Upvotes: 0

Views: 109

Answers (1)

Andy Turner
Andy Turner

Reputation: 140494

Use the word break boundary matcher:

"\\b" + Pattern.quote(stopword) + "\\b"

This matches word breaks, which includes spaces, start/end of string, punctuation etc. See the doc for java.util.Pattern for more details.

I also put in that the stopword should be quoted, since it looks like a variable, and thus shouldn't be trusted to contain a valid regex.

Upvotes: 3

Related Questions