Reputation: 139
I have written a method that will take in a String and split it up, so that it can remove each stop word within the String. I have found a pre-made Array full of Stop Words which will check the string and if it finds it, it will remove it. However, it does not work with all Stop Words.
As you can see, the program does not remove the words, the
, can
and do
.
I am unsure what I am doing wrong and would appreciate any given help. Thank you.
import java.util.ArrayList;
public class Analysis {
public static String[] stopwords = {"a", "as", "able", "about", "above", "according", "accordingly", "across", "actually", "after", "afterwards", "again", "against", "aint", "all", "allow", "allows", "almost", "alone", "along", "already", "also", "although", "always", "am", "among", "amongst", "an", "and", "another", "any", "anybody", "anyhow", "anyone", "anything", "anyway", "anyways", "anywhere", "apart", "appear", "appreciate", "appropriate", "are", "arent", "around", "as", "aside", "ask", "asking", "associated", "at", "available", "away", "awfully", "be", "became", "because", "become", "becomes", "becoming", "been", "before", "beforehand", "behind", "being", "believe", "below", "beside", "besides", "best", "better", "between", "beyond", "both", "brief", "but", "by", "cmon", "cs", "came", "can", "cant", "cannot", "cant", "cause", "causes", "certain", "certainly", "changes", "clearly", "co", "com", "come", "comes", "concerning", "consequently", "consider", "considering", "contain", "containing", "contains", "corresponding", "could", "couldnt", "course", "currently", "definitely", "described", "despite", "did", "didnt", "different", "do", "does", "doesnt", "doing", "dont", "done", "down", "downwards", "during", "each", "edu", "eg", "eight", "either", "else", "elsewhere", "enough", "entirely", "especially", "et", "etc", "even", "ever", "every", "everybody", "everyone", "everything", "everywhere", "ex", "exactly", "example", "except", "far", "few", "ff", "fifth", "first", "five", "followed", "following", "follows", "for", "former", "formerly", "forth", "four", "from", "further", "furthermore", "get", "gets", "getting", "given", "gives", "go", "goes", "going", "gone", "got", "gotten", "greetings", "had", "hadnt", "happens", "hardly", "has", "hasnt", "have", "havent", "having", "he", "hes", "hello", "help", "hence", "her", "here", "heres", "hereafter", "hereby", "herein", "hereupon", "hers", "herself", "hi", "him", "himself", "his", "hither", "hopefully", "how", "howbeit", "however", "i", "id", "ill", "im", "ive", "ie", "if", "ignored", "immediate", "in", "inasmuch", "inc", "indeed", "indicate", "indicated", "indicates", "inner", "insofar", "instead", "into", "inward", "is", "isnt", "it", "itd", "itll", "its", "its", "itself", "just", "keep", "keeps", "kept", "know", "knows", "known", "last", "lately", "later", "latter", "latterly", "least", "less", "lest", "let", "lets", "like", "liked", "likely", "little", "look", "looking", "looks", "ltd", "mainly", "many", "may", "maybe", "me", "mean", "meanwhile", "merely", "might", "more", "moreover", "most", "mostly", "much", "must", "my", "myself", "name", "namely", "nd", "near", "nearly", "necessary", "need", "needs", "neither", "never", "nevertheless", "new", "next", "nine", "no", "nobody", "non", "none", "noone", "nor", "normally", "not", "nothing", "novel", "now", "nowhere", "obviously", "of", "off", "often", "oh", "ok", "okay", "old", "on", "once", "one", "ones", "only", "onto", "or", "other", "others", "otherwise", "ought", "our", "ours", "ourselves", "out", "outside", "over", "overall", "own", "particular", "particularly", "per", "perhaps", "placed", "please", "plus", "possible", "presumably", "probably", "provides", "que", "quite", "qv", "rather", "rd", "re", "really", "reasonably", "regarding", "regardless", "regards", "relatively", "respectively", "right", "said", "same", "saw", "say", "saying", "says", "second", "secondly", "see", "seeing", "seem", "seemed", "seeming", "seems", "seen", "self", "selves", "sensible", "sent", "serious", "seriously", "seven", "several", "shall", "she", "should", "shouldnt", "since", "six", "so", "some", "somebody", "somehow", "someone", "something", "sometime", "sometimes", "somewhat", "somewhere", "soon", "sorry", "specified", "specify", "specifying", "still", "sub", "such", "sup", "sure", "ts", "take", "taken", "tell", "tends", "th", "than", "thank", "thanks", "thanx", "that", "thats", "thats", "the", "their", "theirs", "them", "themselves", "then", "thence", "there", "theres", "thereafter", "thereby", "therefore", "therein", "theres", "thereupon", "these", "they", "theyd", "theyll", "theyre", "theyve", "think", "third", "this", "thorough", "thoroughly", "those", "though", "three", "through", "throughout", "thru", "thus", "to", "together", "too", "took", "toward", "towards", "tried", "tries", "truly", "try", "trying", "twice", "two", "un", "under", "unfortunately", "unless", "unlikely", "until", "unto", "up", "upon", "us", "use", "used", "useful", "uses", "using", "usually", "value", "various", "very", "via", "viz", "vs", "want", "wants", "was", "wasnt", "way", "we", "wed", "well", "were", "weve", "welcome", "well", "went", "were", "werent", "what", "whats", "whatever", "when", "whence", "whenever", "where", "wheres", "whereafter", "whereas", "whereby", "wherein", "whereupon", "wherever", "whether", "which", "while", "whither", "who", "whos", "whoever", "whole", "whom", "whose", "why", "will", "willing", "wish", "with", "within", "without", "wont", "wonder", "would", "would", "wouldnt", "yes", "yet", "you", "youd", "youll", "youre", "youve", "your", "yours", "yourself", "yourselves", "zero"};
public static ArrayList<String> wordsList = new ArrayList<String>();
public Analysis(){
}
public String removeStopWords(){
String tweet = "Feeling miserable with the cold? Here's what you can do.";
tweet = tweet.trim().replaceAll("\\s+", " ");
System.out.println("After trim: " + tweet);
String[] words = tweet.split(" ");
for (String word : words) {
wordsList.add(word);
}
System.out.println("After for loop: " + wordsList);
//remove stop words here from the temp list
for (int i = 0; i < wordsList.size(); i++) {
// get the item as string
for (int j = 0; j < stopwords.length; j++) {
if (stopwords[j].contains(wordsList.get(i))) {
wordsList.remove(i);
}
}
}
for (String str : wordsList) {
System.out.print(str + " ");
}
return null;
}
}
Upvotes: 1
Views: 8957
Reputation: 4135
Change your loop to
for (int j = 0; j < stopwords.length; j++) {
if (wordsList.contains(stopwords[j])) {
wordsList.remove(stopwords[j]);//remove it
}
}
If wordsList
contains any stop words from stopwords
then remove it.
Upvotes: 3
Reputation: 88
package stc;
import java.util.ArrayList;
import java.util.List;
public class Stc {
public static void main(String[] args) {
// TODO code application logic here
String[] stopwords = {"a", "as", "able", "about",
"above", "according", "accordingly", "across", "actually",
"after", "afterwards", "again", "against", "aint", "all",
"allow", "allows", "almost", "alone", "along", "already",
"also", "although", "always", "am", "among", "amongst", "an",
"and", "another", "any", "anybody", "anyhow", "anyone", "anything",
"anyway", "anyways", "anywhere", "apart", "appear", "appreciate",
"appropriate", "are", "arent", "around", "as", "aside", "ask", "asking",
"associated", "at", "available", "away", "awfully", "be", "became", "because",
"become", "becomes", "becoming", "been", "before", "beforehand", "behind", "being",
"believe", "below", "beside", "besides", "best", "better", "between", "beyond", "both",
"brief", "but", "by", "cmon", "cs", "came", "can", "cant", "cannot", "cant", "cause", "causes",
"certain", "certainly", "changes", "clearly", "co", "com", "come",
"comes", "concerning", "consequently", "consider", "considering", "contain",
"containing", "contains","corresponding","could", "couldnt", "course", "currently",
"definitely", "described", "despite", "did", "didnt", "different", "do", "does",
"doesnt", "doing", "dont", "done", "down", "downwards", "during", "each", "edu",
"eg", "eight", "either", "else", "elsewhere", "enough", "entirely", "especially",
"et", "etc", "even", "ever", "every", "everybody", "everyone", "everything", "everywhere",
"ex", "exactly", "example", "except", "far", "few", "ff", "fifth", "first", "five", "followed",
"following", "follows", "for", "former", "formerly", "forth", "four", "from", "further",
"furthermore", "get", "gets", "getting", "given", "gives", "go", "goes", "going", "gone"
, "got", "gotten", "greetings", "had", "hadnt", "happens", "hardly", "has", "hasnt", "have",
"havent", "having", "he", "hes", "hello", "help", "hence", "her", "here", "heres", "hereafter", "hereby", "herein", "hereupon", "hers", "herself", "hi", "him", "himself", "his", "hither", "hopefully", "how", "howbeit", "however", "i", "id", "ill", "im", "ive", "ie", "if", "ignored", "immediate", "in", "inasmuch", "inc", "indeed", "indicate", "indicated", "indicates", "inner", "insofar", "instead", "into", "inward", "is", "isnt", "it", "itd", "itll", "its", "its", "itself", "just", "keep", "keeps", "kept", "know", "knows", "known", "last", "lately", "later", "latter", "latterly", "least", "less", "lest", "let", "lets", "like", "liked", "likely", "little", "look", "looking", "looks", "ltd", "mainly", "many", "may", "maybe", "me", "mean", "meanwhile", "merely", "might", "more", "moreover", "most", "mostly", "much", "must", "my", "myself", "name", "namely", "nd", "near", "nearly", "necessary", "need", "needs", "neither", "never", "nevertheless", "new", "next", "nine", "no", "nobody", "non", "none", "noone", "nor", "normally", "not", "nothing", "novel", "now", "nowhere", "obviously", "of", "off", "often", "oh", "ok", "okay", "old", "on", "once", "one", "ones", "only", "onto", "or", "other", "others", "otherwise", "ought", "our", "ours", "ourselves", "out", "outside", "over", "overall", "own", "particular", "particularly", "per", "perhaps", "placed", "please", "plus", "possible", "presumably", "probably", "provides", "que", "quite", "qv", "rather", "rd", "re", "really", "reasonably", "regarding", "regardless", "regards", "relatively", "respectively", "right", "said", "same", "saw", "say", "saying", "says", "second", "secondly", "see", "seeing", "seem", "seemed", "seeming", "seems", "seen", "self", "selves", "sensible", "sent", "serious", "seriously", "seven", "several", "shall", "she", "should", "shouldnt", "since", "six", "so", "some", "somebody", "somehow", "someone", "something", "sometime", "sometimes", "somewhat", "somewhere", "soon", "sorry", "specified", "specify", "specifying", "still", "sub", "such", "sup", "sure", "ts", "take", "taken", "tell", "tends", "th", "than", "thank", "thanks", "thanx", "that", "thats", "thats", "the", "their", "theirs", "them", "themselves", "then", "thence", "there", "theres", "thereafter", "thereby", "therefore", "therein", "theres", "thereupon", "these", "they", "theyd", "theyll", "theyre", "theyve", "think", "third", "this", "thorough", "thoroughly", "those", "though", "three", "through", "throughout", "thru", "thus", "to", "together", "too", "took", "toward", "towards", "tried", "tries", "truly", "try", "trying", "twice", "two", "un", "under", "unfortunately", "unless", "unlikely", "until", "unto", "up", "upon", "us", "use", "used", "useful", "uses", "using", "usually", "value", "various", "very", "via", "viz", "vs", "want", "wants", "was", "wasnt", "way", "we", "wed", "well", "were", "weve", "welcome", "well", "went", "were", "werent", "what", "whats", "whatever", "when", "whence", "whenever", "where", "wheres", "whereafter", "whereas", "whereby", "wherein", "whereupon", "wherever", "whether", "which", "while", "whither", "who", "whos", "whoever", "whole", "whom", "whose", "why", "will", "willing", "wish", "with", "within", "without", "wont", "wonder", "would", "would", "wouldnt", "yes", "yet", "you", "youd", "youll", "youre", "youve", "your", "yours", "yourself", "yourselves", "zero"};
ArrayList<String> wordsList = new ArrayList<String>();
String tweet = "Feeling miserable with the cold? Here's what you can do.";
tweet = tweet.trim().replaceAll("\\s+", " ");
System.out.println("After trim: " + tweet);
String[] words = tweet.split(" ");
for (String word : words) {
wordsList.add(word);
}
System.out.println("After for loop: " + wordsList);
//remove stop words here from the temp list
for (int i = 0; i < wordsList.size(); i++) {
// get the item as string
for (int j = 0; j < stopwords.length; j++) {
if (stopwords[j].contains(wordsList.get(i))) {
wordsList.remove(i);
}
}
}
for (String str : wordsList) {
System.out.print(str + " ");
}
}}
`this works good
Upvotes: 0
Reputation: 1501
Your problem is when you remove a word, you are shortening the length of your wordsList
and i
is still increasing, hence when accessing using i
, you are missing some words out.
For example, if wordsList
has 5 elements: 0
, 1
, 2
, 3
and 4
, all with their relevant indexes.
You want to remove elements 2
and 3
.
You iteration, i
, makes it to 2
and removes it so you are left with:
0
, 1
, 3
and 5
, which is now has 4 elements at their relevant indexes. Now, i
is then incremented to 3 which would take you to element 5
, which means you have missed 3
altogether.
Upvotes: 3