Jayanga Kaushalya
Jayanga Kaushalya

Reputation: 2744

How to remove some words from a string

I want to remove certain words from a string. The words I want to remove are: "a", "an", "and", "the", "of", and "or".

I used the following method:

 void doNoiseEliminator(Vector<String> input){

        noNoiseLines = new Vector<String>();
        String temp;   

        for(int i = 0; i < input.size(); i++) {

            String regex = "(\\sand\\s)|(\\sa\\s)|(\\sthe\\s)|(\\san\\s)|(\\sof\\s)|(\\sor\\s)";
            temp = input.get(i).replaceAll(regex, " ");
            noNoiseLines.add(temp);            
        }
    }

But this does not seem to work. My program takes a string line and circular shifts the line.

For the following input :

MY NAME IS JOHN
MY NAME IS AN A SAM
MY NAME IS OR RAW

Output is :

  1. a sam my name is
  2. is a sam my name
  3. is john my name
  4. is raw my name
  5. john my name is
  6. my name is a sam
  7. my name is john
  8. my name is raw
  9. name is a sam my
  10. name is john my
  11. name is raw my
  12. raw my name is
  13. sam my name is a

Why is this happening? How can I correct this? Please help me. Thanks...!!!

Upvotes: 1

Views: 6826

Answers (2)

kundan bora
kundan bora

Reputation: 3889

Hey use like this -

 noNoiseLines = new Vector<String>();
String temp;   

for(int i = 0; i < input.size(); i++) {


    temp = input.get(i).replaceAll(" and|an|a|the|of|or ", " ");
    noNoiseLines.add(temp);            
}

}

Put first and then an then a . if you put a before an it will replace all occurance of a including word containing an with "" and only remain is n.

Upvotes: 1

pertz
pertz

Reputation: 393

To be truthful I didnt understand your question fully, but try the simple way first, without regex, your problem might be there. Then go optimizing it if needed.

For example, try something like this.

void doNoiseEliminator(Vector input){

    noNoiseLines = new Vector<String>();
    String temp;   

    for(int i = 0; i < input.size(); i++) {


        temp = input.get(i).replaceAll(" a ", " ").replaceAll(" an ", " ").replaceAll(" and ", " ").replaceAll(" the ", " ").replaceAll(" of ", " ").replaceAll(" or ", " ");
        noNoiseLines.add(temp);            
    }
}

Of course this shouldnt be the final solution, its just to check if it works. Working, you can go towards checking/fixing the regex or any other solution.

Hope it helped to guide to the solution, cya.

Upvotes: 2

Related Questions