Reputation: 9478
I need clarification whether my approach is right or wrong any modifications required.
Let me explain clearly. I will have a excel file in which there will be country code country name years(mm/yyyy) as extra 10 columns
countrycode country Name 12/2000 11/2000 10/2000 09/2000 08/2000 07/2000 06/2000 05/2000 04/2000 03/2000 02/2000 01/2000
IND India 10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8 10.9 11.1 11.2 11.3
USA Uinted States 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 9.1 9.2 9.3
In a row if anyof the price is repeated for that particular year and country , i need to show message as Duplicate present in Excel file.
For the above , i implemented by this way. For a VO i override the hashCode()
with the hashcode of (coutrycode + year + price) and equals
method too and
while inserting in database i pass this VO to HashSet
and I eliminate duplicate and compare the size of original list size with HashSet
size.
But sometime if there is unique price also I am getting message as duplicate.
Please suggest me my approach is right or wrong or another way I can implement.
Upvotes: 4
Views: 382
Reputation: 106
If you are just parsing all the values into Strings then your approach sounds logical to me.
I read your description. You seem to say that a unexpected duplicates are detected. So this really means that 'equals' method is not behaving as you expect I think. If 'hashCode' was incorrect, I think you would get the opposite problem (duplicate NOT detected).
If you are still experiencing issues then attach the implementation of 'hashCode' and 'equals' and it might help to quickly answer the problem.
One more thing. I assume that all sample countries are unique in the file? I mean no countries are duplicated later on in the file?
Upvotes: 0
Reputation: 20048
It is perfectly valid to write code like:
List<CountryInstance> list = ...;
Set<CountryInstance> set = new HashSet<CountryInstance>(list);
if(set.size() < list.size()){
/* There are duplicates */
For it to work you need value class instances. To create one you need to override equals and hashcode. Before you do that read What issues should be considered when overriding equals and hashCode in Java?
Upvotes: 0
Reputation: 4859
Hashbased collections depends on the hashcode()
and equals()
methods to correctly identify duplicates. If you modify these to fit exactly one usecase you are probably likely to have all sorts of side-effects in other use cases.
To say it more explicitly. If you change the methods of your VO to use only a subset of the data, you are likely to encounter unforeseen problems some where else where you might store VOs in hashbased collections.
You should keep hashcode()
and equals()
consistent with data equality, i.e. using all attributes for tests, as suggested in many sources (Source generators in eclipse, @EqualsAndHashcode annotations from Lombok, 'Effective Java' by Joshua Bloch, etc.).
In your explicit case you could create a specific wrapper to calculate your hashcodes and equality based on the subset.
As an example:
public void doit(List<VO> vos) {
Set<VOWrapper> dups = new HashSet<>();
for (VO vo : vos) {
if (dups.contains(new VOWrapper(vo))) {
System.out.println("Found a duplicate");
} else {
dups.add(new VOWrapper(vo));
// Process vo
}
}
}
Based on this VO
@Data // Lombok generates getters/setters/equals/hashcode (using all fields)
public class VO {
private String countrycode;
private String country;
private int month;
private int year;
private double price;
}
And this wrapper
public class VOWrapper {
private final VO vo;
public VOWrapper(VO vo) { this.vo = vo; }
// Equals method with only 3 fields used
@Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
VO other = ((VOWrapper) obj).vo;
if (vo.getCountry() == null) {
if (other.getCountry() != null)
return false;
} else if (!vo.getCountry().equals(other.getCountry()))
return false;
if (vo.getCountrycode() == null) {
if (other.getCountrycode() != null)
return false;
} else if (!vo.getCountrycode().equals(other.getCountrycode()))
return false;
if (Double.doubleToLongBits(vo.getPrice()) != Double.doubleToLongBits(other.getPrice()))
return false;
return true;
}
//Hashcode method with only 3 fields used
@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((vo.getCountry() == null) ? 0 : vo.getCountry().hashCode());
result = prime * result + ((vo.getCountrycode() == null) ? 0 : vo.getCountrycode().hashCode());
long temp;
temp = Double.doubleToLongBits(vo.getPrice());
result = prime * result + (int) (temp ^ (temp >>> 32));
return result;
}
}
Upvotes: 0
Reputation: 6276
Buddy you have taken the right thought and approach to solve the problem but just missing a little edge (information) to solve the problem correctly.
I would like to provide a little hint, that I believe can help and rectify the problem and understand the basics really very well.
If you look at the documentation (or the source code) of hashCode
for the String
and Double
variables, it states
STRING
Returns a hash code for this string. The hash code for a String object is computed as s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1] using int arithmetic, where s[i] is the ith character of the string, n is the length of the string, and ^ indicates exponentiation. (The hash value of the empty string is zero.)
Returns: a hash code value for this object.
DOUBLE
Returns a hash code for this Double object. The result is the exclusive OR of the two halves of the long integer bit representation, exactly as produced by the method
doubleToLongBits(double)
, of the primitive double value represented by this Double object. That is, the hash code is the value of the expression:
(int)(v^(v>>>32))
where v is defined by:
long v = Double.doubleToLongBits(this.doubleValue());
Returns: a hash code value for this object.
So the hashCode()
function returns a unique value in most case, but there are so many cases when the it returns the same int
value for the two objects.
I think you are also getting caught in the same scenario.
A little more hint, you can use the HashMap<Integer,List<String>>
where the Integer
value is hashCode
as you calculated and the List<String>
is the collection of actual value got by forming the String
from coutrycode + year + price .
And the last part is comparison, you can get the List<String>
at the calculated hashCode()
of new value and check if the same String
value do exists in the List
.
Upvotes: 4