Reputation: 310
my question is more of algorithm design nature than about programming. I have 6 buildings in my dataset and a table with distances from each building to each building:
| From_Building_ID | To_Building_ID | Distance_Mile |
+------------------+----------------+---------------+
| 1368 | 10692 | 167.201 |
| 1368 | 10767 | 216.307 |
| 1368 | 6377 | 359.002 |
| 1368 | 10847 | 362.615 |
| 1368 | 10080 | 67.715 |
| 6377 | 10692 | 488.3 |
| 6377 | 1368 | 359.002 |
| 6377 | 10080 | 327.024 |
| 6377 | 10767 | 150.615 |
| 6377 | 10847 | 41.421 |
| 10080 | 10847 | 330.619 |
| 10080 | 6377 | 327.024 |
| 10080 | 10767 | 184.329 |
| 10080 | 10692 | 166.549 |
| 10080 | 1368 | 67.715 |
| 10692 | 1368 | 167.201 |
| 10692 | 10767 | 345.606 |
| 10692 | 6377 | 488.3 |
| 10692 | 10847 | 491.898 |
| 10692 | 10080 | 166.549 |
| 10767 | 1368 | 216.307 |
| 10767 | 10692 | 345.606 |
| 10767 | 10080 | 184.329 |
| 10767 | 10847 | 154.22 |
| 10767 | 6377 | 150.615 |
| 10847 | 6377 | 41.4211 |
| 10847 | 10692 | 491.898 |
| 10847 | 1368 | 362.615 |
| 10847 | 10080 | 330.619 |
| 10847 | 10767 | 154.22 |
+------------------+----------------+---------------+
My goal is to get a short table that includes unique combination of buildings. If a combination between any two buildings has already appeared it should not appear twice, so eventually I should end up with half the number of rows of the original set. I will then sum up the distances (for compensation purposes). the end result should look similar to this:
+------------------+----------------+---------------+
| From_Building_ID | To_Building_ID | Distance_Mile |
+------------------+----------------+---------------+
| 1368 | 10692 | 167.201 |
| 1368 | 10767 | 216.307 |
| 1368 | 6377 | 359.002 |
| 1368 | 10847 | 362.615 |
| 1368 | 10080 | 67.715 |
| 6377 | 10692 | 488.3 |
| 6377 | 10080 | 327.024 |
| 6377 | 10767 | 150.615 |
| 6377 | 10847 | 41.421 |
| 10080 | 10847 | 330.619 |
| 10080 | 10767 | 184.329 |
| 10080 | 10692 | 166.549 |
| 10692 | 10767 | 345.606 |
| 10692 | 10847 | 491.898 |
| 10767 | 10847 | 154.22 |
+------------------+----------------+---------------+
I created a class in C# with the appropriate properties:
class Distances
{
public int FromBuldingID { get; set; }
public int ToBuildingID { get; set; }
public double Distance_Mile { get; set; }
public Distances(int f, int t, double mile)
{
FromBuldingID = f;
ToBuildingID = t;
Distance_Mile = mile;
}
}
and created a List<Distances> dist
that contains all the distances as described.
I tried to select distinct distances, but the data is not reliable, so it's not a viable option,
(for example the distances between 6377 10847
and 10847 6377
are not the same).
I am trying now to design my algorithm, without much success so far:
for (int i = 0; i < dist.Count; i++)
{
if (true)// what may the condition be?
{
}
}
Any help would be appreciated. Thanks!
Upvotes: 0
Views: 85
Reputation: 31
The other answers using LINQ are valid but be aware using LINQ is generally a choice of readability vs performance. If you wished your algorithm to be able to scale performance-wise to much larger datasets, you can use dictionarys with value tuples as keys to achieve fast duplicate checking for each combination in the list when looping through.
Dictionary<ValueTuple<int, int>, boolean> uniqueCombinations = new Dictionary<ValueTuple<int, int>, boolean>();
Be aware value tuples are only available from C# 7.0 onwards. Otherwise you can use standard tuples as the key which will have a performance decrease but the dictionary structure should still make it faster than using LINQ. Tuples are the cleanest way of using unique pairs for dictionary keys since arrays are awkward to compare, using hashcodes to compare rather than the actual values in it.
Insertion should be be done in (toBuildingId, fromBuildingId) order while checking for duplicates in the dictionary should be reverse order with (fromBuildingId, toBuildingId). The boolean value is largely unnecessary but a value is needed to use the unique properties of the Dictionary data structure for fast checking of duplicates.
Upvotes: 0
Reputation: 37060
One way to think about this problem is to consider that we want to use the System.Linq
extension method, Distinct()
to filter our duplicate items, but that method uses the class's default equality comparer to determine if two instances are equal, and the default comparer uses a reference comparison, which doesn't work for our scenario.
Since we want to consider two instances equal if either their FromBuildingId
and ToBuildindId
properties are equal, or if one's FromBuildingId
equals the other's ToBuildingId
, and it's ToBuildingId
equals the other's FromBuildingId
, we need to override the class's default Equals
(and GetHashCode
) method with that logic:
public class Distance
{
public int FromBuildingId { get; set; }
public int ToBuildingId { get; set; }
public double TotalMiles { get; set; }
public Distance(int fromBuildingId, int toBuildingId, double totalMiles)
{
FromBuildingId = fromBuildingId;
ToBuildingId = toBuildingId;
TotalMiles = totalMiles;
}
public override bool Equals(object obj)
{
var other = obj as Distance;
return other != null &&
(other.FromBuildingId == FromBuildingId && other.ToBuildingId == ToBuildingId) ||
(other.FromBuildingId == ToBuildingId && other.ToBuildingId == FromBuildingId);
}
public override int GetHashCode()
{
unchecked
{
return 17 * (FromBuildingId.GetHashCode() + ToBuildingId.GetHashCode());
}
}
}
With this done, we can now use the Distinct
method on our list:
var distances = new List<Distance>
{
new Distance(1, 2, 3.4),
new Distance(2, 1, 3.3), // Should be considered equal to #1
new Distance(5, 6, 7.8),
new Distance(5, 6, 7.2) // Should be considered equal to #3
};
// remove duplicates
var uniqueDistances = distances.Distinct().ToList();
// uniqueDistnaces will only have 2 items: the first and the third from distances.
And then it's just one more extension method to get the Sum
of the distinct distances:
var sum = distances.Distinct().Sum(d => d.TotalMiles);
Upvotes: 2
Reputation: 21477
One way:
var uniques = dist.Where(d=>d.FromBuildingID < d.ToBuildingID).ToList();
A more robust way, which will take both A:B and B:A and use the one with the smallest Distance_Mile, and throw out the other.
var uniques = dist
.GroupBy(d=>new {
a=Math.Min(d.FromBuildingID, d.ToBuildingID),
b=Math.Max(d.FromBuildingID, d.ToBuildingID)
}).Select(d=>d.OrderBy(z=>z.Distance_Mile).First())
.ToList();
In either case, if you just want the sum, instead of the final .ToList()
, just put .Sum(d=>d.Distance_Mile)
Upvotes: 4