Reputation: 24499
I have created a method that takes two Collection<String>
as input and copies one to the other.
However, I am not sure if I should check if the collections contain the same elements before I start copying, or if I should just copy regardless. This is the method:
/**
* Copies from one collection to the other. Does not allow empty string.
* Removes duplicates.
* Clears the too Collection first
* @param src
* @param dest
*/
public static void copyStringCollectionAndRemoveDuplicates(Collection<String> src, Collection<String> dest) {
if(src == null || dest == null)
return;
//Is this faster to do? Or should I just comment this block out
if(src.containsAll(dest))
return;
dest.clear();
Set<String> uniqueSet = new LinkedHashSet<String>(src.size());
for(String f : src)
if(!"".equals(f))
uniqueSet.add(f);
dest.addAll(uniqueSet);
}
Maybe it is faster to just remove the
if(src.containsAll(dest))
return;
Because this method will iterate over the entire collection anyways.
Upvotes: 5
Views: 3078
Reputation: 651
The code is hard to read and is not very efficient. The "dest" parameter is confusing: It's passed as a parameter, then it's cleared and the results are added to it. What's the point of it being a parameter? Why not simply return a new collection? The only benefit I can see is that the caller can determine the collection type. Is that necessary?
I think this code can be more clearly and probably more efficiently written as follows:
public static Set<String> createSet(Collection<String> source) {
Set<String> destination = new HashSet<String>(source) {
private static final long serialVersionUID = 1L;
public boolean add(String o) {
if ("".equals(o)) {
return false;
}
return super.add(o);
}
};
return destination;
}
Another way is to create your own set type:
public class NonEmptyStringSet extends HashSet<String> {
private static final long serialVersionUID = 1L;
public NonEmptyStringSet() {
super();
}
public NonEmptyStringSet(Collection<String> source) {
super(source);
}
public boolean add(String o) {
if ("".equals(o)) {
return false;
}
return super.add(o);
}
}
Usage:
createSet(source);
new NonEmptyStringSet(source);
Returning the set is more performant because you don't first have to create a temporary set and then add all to the dest collection.
The benefit of the NonEmptyStringSet type is that you can keep adding strings and still have the empty string check.
EDIT1:
Removing the "if(src.containsAll(dest)) return;" code introduces a "bug" when calling the method with source == dest; The result is that source will be empty. Example:
Collection<String> source = new ArrayList<String>();
source.add("abc");
copyStringCollectionAndRemoveDuplicates(source, source);
System.out.println(source);
EDIT2:
I did a small benchmark which shows that my implementation is about 30% faster then a simplified version of your initial implementation. This benchmark is an optimal case for your initial implementation because the dest colletion is empty, so it doesn't have to clear it. Also take not that my implementation uses HashSet instead of LinkedHashSet which makes my implementation a bit faster.
Benchmark code:
public class SimpleBenchmark {
public static void main(String[] args) {
Collection<String> source = Arrays.asList("abc", "def", "", "def", "",
"jsfldsjdlf", "jlkdsf", "dsfjljka", "sdfa", "abc", "dsljkf", "dsjfl",
"js52fldsjdlf", "jladsf", "dsfjdfgljka", "sdf123a", "adfgbc", "dslj452kf", "dsjfafl",
"js21ldsjdlf", "jlkdsvbxf", "dsfjljk342a", "sdfdsa", "abxc", "dsljkfsf", "dsjflasd4" );
int runCount = 1000000;
long start1 = System.currentTimeMillis();
for (int i = 0; i < runCount; i++) {
copyStringCollectionAndRemoveDuplicates(source, new ArrayList<String>());
}
long time1 = (System.currentTimeMillis() - start1);
System.out.println("Time 1: " + time1);
long start2 = System.currentTimeMillis();
for (int i = 0; i < runCount; i++) {
new NonEmptyStringSet(source);
}
long time2 = (System.currentTimeMillis() - start2);
System.out.println("Time 2: " + time2);
long difference = time1 - time2;
double percentage = (double)time2 / (double) time1;
System.out.println("Difference: " + difference + " percentage: " + percentage);
}
public static class NonEmptyStringSet extends HashSet<String> {
private static final long serialVersionUID = 1L;
public NonEmptyStringSet() {
}
public NonEmptyStringSet(Collection<String> source) {
super(source);
}
@Override
public boolean add(String o) {
if ("".equals(o)) {
return false;
}
return super.add(o);
}
}
public static void copyStringCollectionAndRemoveDuplicates(
Collection<String> src, Collection<String> dest) {
Set<String> uniqueSet = new LinkedHashSet<String>(src.size());
for (String f : src)
if (!"".equals(f))
uniqueSet.add(f);
dest.addAll(uniqueSet);
}
}
Upvotes: 1
Reputation: 718718
I don't really think that I understand why you would want this method, but assuming that it is worthwhile, I would implement it as follows:
public static void copyStringCollectionAndRemoveDuplicates(
Collection<String> src, Collection<String> dest) {
if (src == dest) {
throw new IllegalArgumentException("src == dest");
}
dest.clear();
if (dest instanceof Set) {
dest.addAll(src);
dest.remove("");
} else if (src instance of Set) {
for (String s : src) {
if (!"".equals(s)) {
dest.add(s);
}
}
} else {
HashSet<String> tmp = new HashSet<String>(src);
tmp.remove("");
dest.addAll(tmp);
}
}
Notes:
src
argument in all cases, but the method signature implies that this is irrelevant.NullPointerException
to be thrown.Upvotes: 0
Reputation: 114757
I'd say: Remove it! It's duplicate 'code', the Set is doing the same 'contains()' operation so there is no need to preprocess it here. Unless you have a huge input collection and a brilliant O(1) test for the containsAll() ;-)
The Set is fast enough. It has a O(n) complexity based on the size of the input (one contains() and (maybe) one add() operation for every String) and if the target.containsAll() test fails, contains() is done twice for each String -> less performant.
EDIT
Some pseudo code to visualize my answer
void copy(source, dest) {
bool:containsAll = true;
foreach(String s in source) { // iteration 1
if (not s in dest) { // contains() test
containsAll=false
break
}
}
if (not containsAll) {
foreach(String s in source) { // iteration 2
if (not s in dest) { // contains() test
add s to dest
}
}
}
}
If all source elements are in dest, then contains() is called once for each source element. If all but the last source elements are in dest (worst case), then contains() is called (2n-1) times (n=size of source collection). But the total number of contains() test with the extra test is always equal or greater then the same code without the extra test.
EDIT 2 Lets assume, we have the following collections:
source = {"", "a", "b", "c", "c"}
dest = {"a", "b"}
First, the containsAll test fails, because the empty String in source is not in dest (this is a small design flaw in your code ;)). Then you create an temporary set which will be {"a", "b", "c"}
(empty String and second "c" ignored). Finally you add everthing to dest and assuming, dest is a simple ArrayList, the result is {"a", "b", "a", "b", "c"}
. Is that the intention? A shorter alternative:
void copy(Collection<String> in, Collection<String> out) {
Set<String> unique = new HashSet<String>(in);
in.remove("");
out.addAll(unique);
}
Upvotes: 7
Reputation: 66156
Too much confusing parameter names. dest
and target
have almost same meaning. You'd better choose something like dest
and source
. It'll make things much clearer even for you.
I have a feeling (not sure that it's correct) that you use collections API in a wrong way. Interface Collection
doesn't say anything about uniquness of its elements but you add this quality to it.
Modifying collections which passed as parameters is not the best idea (but as usual, it depends). In general case, mutability is harmful and unnecessary. Moreover, what if passed collections are unmodifiable/immutable? It's better to return new collection then modify incoming collections.
Collection
interface has methods addAll
, removeAll
, retainAll
. Did you try them first? Have you made performance tests for the code like:
Collection<String> result = new HashSet<String> (dest);
result.addAll (target);
or
target.removeAll (dest);
dest.addAll (target);
Upvotes: 1
Reputation: 2850
The containsAll()
would not help if target
has more elements than dest
:
target: [a,b,c,d]
dest: [a,b,c]
target.containsAll(dest)
is true, so dest is [a,b,c] but should be [a,b,c,d].
I think the following code is more elegant:
Set<String> uniqueSet = new LinkedHashSet<String>(target.size());
uniqueSet.addAll(target);
if(uniqueSet.contains(""))
uniqueSet.remove("");
dest.addAll(uniqueSet);
Upvotes: 3
Reputation: 66876
You could benchmark it, if it mattered that much. I think the call to containsAll()
likely does not help, though it could depend on how often the two collections have the same contents.
But this code is confusing. It's trying to add new items to dest
? So why does it clear it first? Just instead return your new uniqueSet
to the caller instead of bothering. And isn't your containsAll()
check reversed?
Upvotes: 2