Reputation: 139
Function to be refactored...
<T> T notUsedRandomItem(List<T> allItems, List<T> usedItems) {
return allItems.stream()
.filter(item -> !usedItems.contains(item))
.sorted((o1, o2) -> new Random().nextInt(2) - 1)
.findFirst()
.orElseThrow(() -> new RuntimeException("Did not find item!"));
}
Function might be used like this...
System.out.println(
notUsedRandomItem(
Arrays.asList(1, 2, 3, 4),
Arrays.asList(1, 2)
)
); // Should print either 3 or 4
Edit: Collected suggested implementations and tested efficiency by running them against Person lists.
edit2: Added missing equals method to Person class.
import java.util.*;
import java.util.concurrent.TimeUnit;
import java.util.function.BiFunction;
import java.util.stream.Collectors;
import java.util.stream.IntStream;
import static java.util.stream.Collectors.toList;
class Functions {
<T> T notUsedRandomItemOriginal(List<T> allItems, List<T> usedItems) {
return allItems.stream()
.filter(item -> !usedItems.contains(item))
.sorted((o1, o2) -> new Random().nextInt(2) - 1)
.findFirst()
.orElseThrow(() -> new RuntimeException("Did not find item!"));
}
<T> T notUsedRandomItemByAominè(List<T> allItems, List<T> usedItems) {
List<T> distinctItems = allItems.stream()
.filter(item -> !usedItems.contains(item))
.collect(toList());
if (distinctItems.size() == 0) throw new RuntimeException("Did not find item!");
return distinctItems.get(new Random().nextInt(distinctItems.size()));
}
<T> T notUsedRandomItemByEugene(List<T> allItems, List<T> usedItems) {
// this is only needed because your input List might not support removeAll
List<T> left = new ArrayList<>(allItems);
List<T> right = new ArrayList<>(usedItems);
left.removeAll(right);
return left.get(new Random().nextInt(left.size()));
}
<T> T notUsedRandomItemBySchaffner(List<T> allItems, List<T> usedItems) {
Set<T> used = new HashSet<>(usedItems);
List<T> all = new ArrayList<>(allItems);
Collections.shuffle(all);
for (T item : all) if (!used.contains(item)) return item;
throw new RuntimeException("Did not find item!");
}
}
public class ComparingSpeedOfNotUsedRandomItemFunctions {
public static void main(String[] plaa) {
runFunctionsWith(100);
runFunctionsWith(1000);
runFunctionsWith(10000);
runFunctionsWith(100000);
runFunctionsWith(200000);
}
static void runFunctionsWith(int itemCount) {
TestConfiguration testConfiguration = new TestConfiguration();
Functions functions = new Functions();
System.out.println("Function execution time with " + itemCount + " items...");
System.out.println("Schaffner: " +
testConfiguration.timeSpentForFindingNotUsedPeople(
itemCount, (allPeople, usedPeople) ->
functions.notUsedRandomItemBySchaffner(allPeople, usedPeople)
));
System.out.println("Eugene: " +
testConfiguration.timeSpentForFindingNotUsedPeople(
itemCount, (allPeople, usedPeople) ->
functions.notUsedRandomItemByEugene(allPeople, usedPeople)
));
System.out.println("Aominè: " +
testConfiguration.timeSpentForFindingNotUsedPeople(
itemCount, (allPeople, usedPeople) ->
functions.notUsedRandomItemByAominè(allPeople, usedPeople)
));
System.out.println("Original: " +
testConfiguration.timeSpentForFindingNotUsedPeople(
itemCount, (allPeople, usedPeople) ->
functions.notUsedRandomItemOriginal(allPeople, usedPeople)
));
}
}
class TestConfiguration {
Long timeSpentForFindingNotUsedPeople(int numberOfPeople, BiFunction<List<Person>, List<Person>, Person> function) {
ArrayList<Person> people = new ArrayList<>();
IntStream.range(1, numberOfPeople).forEach(i -> people.add(new Person()));
Collections.shuffle(people);
List<Person> halfOfPeople =
people.stream()
.limit(numberOfPeople / 2)
.collect(Collectors.toList());
Collections.shuffle(halfOfPeople);
long before = System.nanoTime();
Person foundItem = function.apply(people, halfOfPeople);
long after = System.nanoTime();
// Return -1 if function do not return valid answer
if (halfOfPeople.contains(foundItem))
return (long) -1;
return TimeUnit.MILLISECONDS.convert(after - before, TimeUnit.NANOSECONDS);
}
class Person {
public final String name = UUID.randomUUID().toString();
@Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Person person = (Person) o;
return name != null ? name.equals(person.name) : person.name == null;
}
@Override
public int hashCode() {
return name != null ? name.hashCode() : 0;
}
}
}
Results:
Function execution time with 100 items...
Schaffner: 0
Eugene: 1
Aominè: 2
Original: 5
Function execution time with 1000 items...
Schaffner: 0
Eugene: 14
Aominè: 13
Original: 5
Function execution time with 10000 items...
Schaffner: 2
Eugene: 564
Aominè: 325
Original: 348
Function execution time with 20000 items...
Schaffner: 3
Eugene: 1461
Aominè: 1418
Original: 1433
Function execution time with 30000 items...
Schaffner: 3
Eugene: 4616
Aominè: 2832
Original: 4567
Function execution time with 40000 items...
Schaffner: 4
Eugene: 10889
Aominè: 4903
Original: 10394
Conclusion
When list size reach 10000 items then so far only Schaffner's implementation is usable.
And because it's fairly simple to read I will pick it as the most elegant solution.
Upvotes: 8
Views: 191
Reputation: 34470
You should use HashSet
s to improve performance:
<T> T notUsedRandomItem(List<T> allItems, List<T> usedItems) {
Set<T> used = new HashSet<>(usedItems);
Set<T> all = new HashSet<>(allItems);
all.removeIf(used::contains); // or all.removeAll(used)
if (all.isEmpty()) throw new RuntimeException("Did not find item!");
int skip = new Random().nextInt(all.size());
Iterator<T> it = all.iterator();
for (int i = 0; i < skip; i++) it.next();
return it.next();
}
This removes elements from the all
set if they belong to the used
set. As Set.removeIf
and Set.contains
are being used, the removal of elements is optimal w.r.t performance. Then, a random number of elements is skipped in the resulting set, and finally, the next element of the set is returned.
Another approach is to shuffle the all
list first and then simply iterate and return the first element that doesn't belong to the used
set:
<T> T notUsedRandomItem(List<T> allItems, List<T> usedItems) {
Set<T> used = new HashSet<>(usedItems);
List<T> all = new ArrayList<>(allItems);
Collections.shuffle(all);
for (T item : all) if (!used.contains(item)) return item;
throw new RuntimeException("Did not find item!");
}
EDIT: Checking the last snippet of code, I now realize that there's no need to shuffle the whole list. Instead, you could randomize the indices of the allItems
list and return the first element that doesn't belong to the used
set:
<T> T notUsedRandomItem(List<T> allItems, List<T> usedItems) {
Set<T> used = new HashSet<>(usedItems);
return new Random().ints(allItems.size(), 0, allItems.size())
.mapToObj(allItems::get)
.filter(item -> !used.contains(item))
.findAny()
.orElseThrow(() -> new RuntimeException("Did not find item!"));
}
Upvotes: 3
Reputation: 56473
The Comparator
you've passed into the sorted
intermediate operation seems wrong and strange way to use a Comparator
to my eyes anyway; which relates to what @Eugene has mentioned in his post.
Thus, I'd recommend you avoid any type of pitfalls and always use an API the way it's intended to be used; nothing more.
if you really want a random element from the said list; the only way that is possible is to find all the distinct elements of the two lists. so we cannot improve the speed in this aspect.
once this is done we simply need to generate a random integer within the range of the list containing the distinct elements and index into it given there is at least one element contained in it.
Though I have to admit there are probably better ways to accomplish the task at hand without the use of streams; here's how I have modified your code slightly to remove the misuse of .sorted((o1, o2) -> new Random().nextInt(2) - 1)
.
<T> T notUsedRandomItem(List<T> allItems, List<T> usedItems) {
List<T> distinctItems = allItems.stream()
.filter(item -> !usedItems.contains(item))
.collect(toList());
if(distinctItems.size() == 0) throw new RuntimeException("Did not find item!");
return distinctItems.get(new Random().nextInt(distinctItems.size()));
}
Upvotes: 1
Reputation: 120988
I can think of this, but no idea what-so-ever how it will scale compared to your existing solution:
<T> T notUsedRandomItem(List<T> allItems, List<T> usedItems) {
// this is only needed because your input List might not support removeAll
List<T> left = new ArrayList<>(allItems);
List<T> right = new ArrayList<>(usedItems);
left.removeAll(right);
return left.get(new Random().nextInt(left.size()));
}
One thing to keep in mind is that sorted
is a stateful operation, so it will sort the entire "diff", but you only retrieve one element from that. Also your Comparator
is wrong, for the same two values o1
and o2
you might say they are different - this can break in mysterious ways.
Upvotes: 4