user3766930
user3766930

Reputation: 5829

How can I eliminate duplicate words from String in Java?

I have an ArrayList of Strings and it contains records such as:

this is a first sentence
hello my name is Chris 
what's up man what's up man
today is tuesday

I need to clear this list, so that the output does not contain repeated content. In the case above, the output should be:

this is a first sentence
hello my name is Chris 
what's up man
today is tuesday

as you can see, the 3rd String has been modified and now contains only one statement what's up man instead of two of them. In my list there is a situation that sometimes the String is correct, and sometimes it is doubled as shown above.

I want to get rid of it, so I thought about iterating through this list:

for (String s: myList) {

but I cannot find a way of eliminating duplicates, especially since the length of each string is not determined, and by that I mean there might be record:

this is a very long sentence this is a very long sentence

or sometimes short ones:

single word singe word

is there some native java function for that maybe?

Upvotes: 3

Views: 14684

Answers (6)

umesh verma
umesh verma

Reputation: 11

//Doing it in Java 8

String str1 = "I am am am a good Good coder";
        String[] arrStr = str1.split(" ");
        String[] element = new String[1];
        return Arrays.stream(arrStr).filter(str1 -> {
            if (!str1.equalsIgnoreCase(element[0])) {
                element[0] = str1;
               return true;
            }return false;
        }).collect(Collectors.joining(" "));

Upvotes: 1

minigeek
minigeek

Reputation: 3166

simple logic : split every word by token space i.e " " and now add it in LinkedHashSet , Retrieve back, Replace "[","]",","

 String s = "I want to walk my dog I want to walk my dog";
 Set<String> temp = new LinkedHashSet<>();
 String[] arr = s.split(" ");

 for ( String ss : arr)
      temp.add(ss);

 String newl = temp.toString()
          .replace("[","")
          .replace("]","")
          .replace(",","");

 System.out.println(newl);

o/p : I want to walk my dog

Upvotes: 1

Veneet Reddy
Veneet Reddy

Reputation: 2907

Assumptions:

  1. Uppercase words are equal to lowercase counterparts.

String fullString = "lol lol";
String[] words = fullString.split("\\W+");
StringBuilder stringBuilder = new StringBuilder();
Set<String> wordsHashSet = new HashSet<>();

for (String word : words) {
    // Check for duplicates
    if (wordsHashSet.contains(word.toLowerCase())) continue;

    wordsHashSet.add(word.toLowerCase());
    stringBuilder.append(word).append(" ");
}
String nonDuplicateString = stringBuilder.toString().trim();

Upvotes: 1

vhula
vhula

Reputation: 497

I would suggest using regular expressions. I was able to remove duplicates using this pattern: \b([\w\s']+) \1\b

public class Main {
    static String [] phrases = {
            "this is a first sentence",
            "hello my name is Chris",
            "what's up man what's up man",
            "today is tuesday",
            "this is a very long sentence this is a very long sentence",
            "single word single word",
            "hey hey"
    };
    public static void main(String[] args) throws Exception {
        String duplicatePattern = "\\b([\\w\\s']+) \\1\\b";
        Pattern p = Pattern.compile(duplicatePattern);
        for (String phrase : phrases) {
            Matcher m = p.matcher(phrase);
            if (m.matches()) {
                System.out.println(m.group(1));
            } else {
                System.out.println(phrase);
            }
        }
    }
}

Results:

this is a first sentence
hello my name is Chris
what's up man
today is tuesday
this is a very long sentence
single word
hey

Upvotes: 2

airos
airos

Reputation: 752

Assuming the String is repeated just twice, and with an space in between as in your examples, the following code would remove repetitions:

for (int i=0; i<myList.size(); i++) {
    String s = myList.get(i);
    String fs = s.substring(0, s.length()/2);
    String ls = s.substring(s.length()/2+1, s.length());
    if (fs.equals(ls)) {
        myList.set(i, fs);
    }
}

The code just split each entry of the list into two substrings (dividing by the half point). If both are equal, substitute the original element with only one half, thus removing the repetition.

I was testing the code and did not see @Brendan Robert answer. This code follows the same logic as his answer.

Upvotes: 2

Brendan Robert
Brendan Robert

Reputation: 95

It depends on the situation that you have but assuming that the string can be repeated at most twice and not three or more times you could find the length of the entire string, find the halfway point and compare each index after the halfway point with the matching beginning index. If the string can be repeated more than once you will need a more complicated algorithm that would first determine how many times the string is repeated and then finds the starting index of each repeat and truncates all index's from the beginning of the first repeat onward. If you can provide some more context into what possible scenarios you expect to handle we can start putting together some ideas.

Upvotes: 0

Related Questions