Reputation: 43
I have a real world problem that I believe requires some sort of optimization greater than simple sorts on the arrays of data I am concerned with. I will outline the problem below:
I have a data set comprised of different devices, each having a property A and B. A and B are not dependent on each other, however, I would like to pack 3 of these devices in a particular way.
A values should be similar with respect to each other among the 3 devices selected.
B values should also be similar with respect to each other.
For instance, in this data, there are:
250 devices (having a single value of each A and B)
90 unique values of A
36 unique values of B
Ultimately, these devices should be packed in threes, having a good balance among A values, respectively and B values, respectively. For each property, neighboring bins could also be used if there is not a triplet containing identical values for each criteria.
I would like to group these devices into packages of 3 for as many packages are possible given the constraints.
So my questions are: What sort of combinatorial problem is this and what ways are there to implement it in Python? Any resources regarding these types of problems would be very appreciated as I am rather new to this fascinating subject.
If any part of the explanation is unclear, please let me know and I will try to clarify. Thank you!
Editing for clarity: Here is some example data:
DeviceNumber = [1,2,3,4,5,6]
A = [0.3, 0.3, 0.4, 0.2, 0.3, 0.4]
B = [0.02, 0.04, 0.03, 0.02, 0.02, 0.03]
I would want it to group it so that
module 1 would have devices 1,4,5
module 2 would have devices 2,3,6
Let us assume the data for A and B are normally distributed.
I guess I am trying to minimize the variance between just A values and just B values and finding ways to group them by threes.
Edit 2: So, the data is at work and I'm not at the moment, but here are some graphs I made of the value distributions of A and B for 12 devices
https://i.sstatic.net/aHkhn.jpg
Several devices have the same A value. These devices may also have similar B values among them. If so, I would like to put three of those together, remove them, and repeat checking values and grouping. As matches decrease, I would like to broaden my search criteria for grouping.
I hope that clears up some more of the questions. Thanks again for all the feedback so far!
Upvotes: 4
Views: 524
Reputation: 43497
Your problem appears to be a standard cluster analysis, specifically k-medoids. Given the way k-medoids works, you don't need to remove the clusters from the set, you just have to set k to n / 3.
There doesn't seem to be an "authoritative" k-medoids Python package implementing the algorithm but pyCluster looks reasonable (with only C-based documentation). It is notably absent from SciPy.cluster.
Given the sample data you've presented in the images, you'll wind up with something like this by-eye k-medoids clustering:
Upvotes: 1
Reputation: 1244
If the criteria were actually like the sample data you could make a list of tuples containing
[(a[ii]+b[ii], ii)...]
then sort the list and pull 3 at a time from each end till you meet at the middle. This would get the combinations most like each other.
Upvotes: 0
Reputation: 43497
This is not an answer but the clarification I need won't fit in a comment.
First, if you could make the problem less abstract by saying more about what you are trying to accomplish it would help greatly.
I would want it to group it so that module 1 would have devices 1,4,5
What does {1, 4, 5} signify? is it (0.3, 0.2, 0.3) by drawing from A? I don't think so. Nor does (0.02, 0.02, 0.02) drawing from B. Neither does the union of those two sets make sense.
For each property, neighboring bins could also be used if there is not a triplet containing identical values for each criteria.
How do I select "identical values"?
It looks like there could be an interesting question hiding in there, but it is really hard to understand the task specification.
Upvotes: 0