Stewbob
Stewbob

Reputation: 16899

LINQ to JSON group query on array

I have a sample of JSON data that I am converting to a JArray with NewtonSoft.

        string jsonString = @"[{'features': ['sunroof','mag wheels']},{'features': ['sunroof']},{'features': ['mag wheels']},{'features': ['sunroof','mag wheels','spoiler']},{'features': ['sunroof','spoiler']},{'features': ['sunroof','mag wheels']},{'features': ['spoiler']}]";

I am trying to retrieve the features that are most commonly requested together. Based on the above dataset, my expected output would be:

sunroof, mag wheels, 2
sunroof, 1
mag wheels 1
sunroof, mag wheels, spoiler, 1
sunroof, spoiler, 1
spoiler, 1

However, my LINQ is rusty, and the code I am using to query my JSON data is returning the count of the individual features, not the features selected together:

        JArray autoFeatures = JArray.Parse(jsonString);
        var features = from f in autoFeatures.Select(feat => feat["features"]).Values<string>()
                       group f by f into grp
                       orderby grp.Count() descending
                       select new { indFeature = grp.Key, count = grp.Count() };

        foreach (var feature in features)
        {
            Console.WriteLine("{0}, {1}", feature.indFeature, feature.count);
        }

Actual Output:
sunroof, 5
mag wheels, 4
spoiler, 3

I was thinking maybe my query needs a 'distinct' in it, but I'm just not sure.

Upvotes: 3

Views: 908

Answers (2)

steve16351
steve16351

Reputation: 5812

You could use a HashSet to identify the distinct sets of features, and group on those sets. That way, your Linq looks basically identical to what you have now, but you need an additional IEqualityComparer class in the GroupBy to help compare one set of features to another to check if they're the same.

For example:

var featureSets = autoFeatures
    .Select(feature => new HashSet<string>(feature["features"].Values<string>()))
    .GroupBy(a => a, new HashSetComparer<string>())
    .Select(a => new { Set = a.Key, Count = a.Count() })
    .OrderByDescending(a => a.Count);

foreach (var result in featureSets)
{
    Console.WriteLine($"{String.Join(",", result.Set)}: {result.Count}");
}

And the comparer class leverages the SetEquals method of the HashSet class to check if one set is the same as another (and this handles the strings being in a different order within the set, etc.)

public class HashSetComparer<T> : IEqualityComparer<HashSet<T>>
{
    public bool Equals(HashSet<T> x, HashSet<T> y)
    {
        // so if x and y both contain "sunroof" only, this is true 
        // even if x and y are a different instance
        return x.SetEquals(y);
    }

    public int GetHashCode(HashSet<T> obj)
    {
        // force comparison every time by always returning the same, 
        // or we could do something smarter like hash the contents
        return 0; 
    }
}

Upvotes: 3

DetectivePikachu
DetectivePikachu

Reputation: 650

This is a problem with the Select. You are telling it to make each value found in the arrays to be its own item. In actuality you need to combine all the values into a string for each feature. Here is how you do it

var features = from f in autoFeatures.Select(feat => string.Join(",",feat["features"].Values<string>()))
                       group f by f into grp
                       orderby grp.Count() descending
                       select new { indFeature = grp.Key, count = grp.Count() };

Produces the following output

sunroof,mag wheels, 2
sunroof, 1
mag wheels, 1
sunroof,mag wheels,spoiler, 1
sunroof,spoiler, 1
spoiler, 1

Upvotes: 4

Related Questions