Reputation: 16899
I have a sample of JSON data that I am converting to a JArray with NewtonSoft.
string jsonString = @"[{'features': ['sunroof','mag wheels']},{'features': ['sunroof']},{'features': ['mag wheels']},{'features': ['sunroof','mag wheels','spoiler']},{'features': ['sunroof','spoiler']},{'features': ['sunroof','mag wheels']},{'features': ['spoiler']}]";
I am trying to retrieve the features that are most commonly requested together. Based on the above dataset, my expected output would be:
sunroof, mag wheels, 2
sunroof, 1
mag wheels 1
sunroof, mag wheels, spoiler, 1
sunroof, spoiler, 1
spoiler, 1
However, my LINQ is rusty, and the code I am using to query my JSON data is returning the count of the individual features, not the features selected together:
JArray autoFeatures = JArray.Parse(jsonString);
var features = from f in autoFeatures.Select(feat => feat["features"]).Values<string>()
group f by f into grp
orderby grp.Count() descending
select new { indFeature = grp.Key, count = grp.Count() };
foreach (var feature in features)
{
Console.WriteLine("{0}, {1}", feature.indFeature, feature.count);
}
Actual Output:
sunroof, 5
mag wheels, 4
spoiler, 3
I was thinking maybe my query needs a 'distinct' in it, but I'm just not sure.
Upvotes: 3
Views: 908
Reputation: 5812
You could use a HashSet
to identify the distinct sets of features, and group on those sets. That way, your Linq looks basically identical to what you have now, but you need an additional IEqualityComparer
class in the GroupBy
to help compare one set of features to another to check if they're the same.
For example:
var featureSets = autoFeatures
.Select(feature => new HashSet<string>(feature["features"].Values<string>()))
.GroupBy(a => a, new HashSetComparer<string>())
.Select(a => new { Set = a.Key, Count = a.Count() })
.OrderByDescending(a => a.Count);
foreach (var result in featureSets)
{
Console.WriteLine($"{String.Join(",", result.Set)}: {result.Count}");
}
And the comparer class leverages the SetEquals method of the HashSet
class to check if one set is the same as another (and this handles the strings being in a different order within the set, etc.)
public class HashSetComparer<T> : IEqualityComparer<HashSet<T>>
{
public bool Equals(HashSet<T> x, HashSet<T> y)
{
// so if x and y both contain "sunroof" only, this is true
// even if x and y are a different instance
return x.SetEquals(y);
}
public int GetHashCode(HashSet<T> obj)
{
// force comparison every time by always returning the same,
// or we could do something smarter like hash the contents
return 0;
}
}
Upvotes: 3
Reputation: 650
This is a problem with the Select. You are telling it to make each value found in the arrays to be its own item. In actuality you need to combine all the values into a string for each feature. Here is how you do it
var features = from f in autoFeatures.Select(feat => string.Join(",",feat["features"].Values<string>()))
group f by f into grp
orderby grp.Count() descending
select new { indFeature = grp.Key, count = grp.Count() };
Produces the following output
sunroof,mag wheels, 2
sunroof, 1
mag wheels, 1
sunroof,mag wheels,spoiler, 1
sunroof,spoiler, 1
spoiler, 1
Upvotes: 4