Reputation: 5526
I get a set of elements by parsing a html document. There is a possibility that the elements may contain duplicates. What is the best way to list only unique elements?
I come from C++ background and see a possibility of doing it using a set and custom equality operation. However, not sure how to do it in Java. Appreciate any code that would help me do it the right and efficient way.
ArrayList<Element> values = new ArrayList<Element>();
// Parse the html and get the document
Document doc = Jsoup.parse(htmlDataInStringFormat);
// Go through each selector and find all matching elements
for ( String selector: selectors ) {
//Find elements matching this selector
Elements elements = doc.select(selector);
//If there are no matching elements, proceed to next selector
if ( 0 == elements.size() ) continue;
for (Element elem: elements ){
values.add(elem);
}
}
if ( elements.size() > 0 ) {
????? // Need to remove duplicates here
}
Upvotes: 0
Views: 120
Reputation: 5526
While the answers posted work if there is a possibility to modify the element, I cannot do that. I donot need a sorted set, hence here is the solution I found..
TreeSet<Element> nt = new TreeSet<Element>(new Comparator<Element>(){
public int compare(Element a, Element b){
if ( a == b )
return 0;
if ( (a.val - b.val) > 0 )
return 1;
return -1;
}
});
for (Element elem: elements ){
nt.add(elem);
}
Upvotes: 0
Reputation: 1084
java.util.HashSet
will give you an Unordered set there are also other extensions of java.util.Set
in the API that will give you ordered sets or concurrent behaviour if needed.
Depending upon what the class Element is you may additionally need to implement the equals and hashCode functions on it. as per comments by @musical_coder.
eg:
Set<Element> set = new HashSet<Element>(elements);
in Order to provide an overridden equals method or Element I would create thin wrapper around the Element class for myself MyElement or something more sencibly named eg
public static class MyElement extends Element {
private final Element element;
public MyElement(Element element){
this.element = element;
}
// OverRide equals and Hashcode
// Delegate all other methods
}
and pass that into the set, ok so now I'm hoping the class isn't final. Effectivly wrapp all your elements in this class. Ah ElementWrapper that is a better name.
Upvotes: 3
Reputation: 176
Additionally override the equals and hashCode method of Element
class Element {
...
public boolean equals(Object o) {
if (! (o instanceof Element)) {
return false;
}
Element other = (Element)o;
//compare the elements of this and o like
if (o.a != this.a) { return false;}
...
}
...
public int hashCode() {
//compute a value that is will return equal hash code for equal objects
}
}
Upvotes: 0
Reputation: 34424
Use HashSet if you just want to avoid duplicate. Use Tree set if you want ordering alongwith avoiding duplicates
Upvotes: 1
Reputation: 24134
Add the elements to a java.util.HashSet
and it would contain only unique elements.
Upvotes: 2