Arnab
Arnab

Reputation: 2354

Top 5 of a column and top 5 of another column for each of first column

I have a lot of data in the below format..

var data1 = new[] { 
    new { Product = "Product 1", Year = 2009, Sales = 1212 },
    new { Product = "Product 2", Year = 2009, Sales = 522 },
    new { Product = "Product 1", Year = 2010, Sales = 1337 },
    new { Product = "Product 2", Year = 2011, Sales = 711 },
    new { Product = "Product 2", Year = 2012, Sales = 2245 },
    new { Product = "Product 3", Year = 2012, Sales = 1000 }
};

If I wanted to get the top 20 rows with max sales, I could do something as below..

data1.OrderByDescending(o=>o.Sales).Take(20);

But what I want to do is get the top 5 Products and (for those products) the top 5 years along with their sales.

So, the output would be something like below:

var outputdata = new[] {  
    new { Product = "Product 1", Year = 2012, Sales = 2245 },
    new { Product = "Product 1", Year = 2010, Sales = 1337 },
    new { Product = "Product 1", Year = 2009, Sales = 1212 },
    new { Product = "Product 1", Year = 2011, Sales = 711 },
    new { Product = "Product 1", Year = 2013, Sales = 522 },  
    new { Product = "Product 2", Year = 2012, Sales = 1000 }
};

This might be a similar question for sql. but unfortunately could not understand how to convert to linq.

Upvotes: 0

Views: 102

Answers (3)

Gilad Green
Gilad Green

Reputation: 37299

Ok if I understood correctly: First group by the product so you can order by the total sales of a product. Then you can take only the amount you want. Use SelectMany to flatten the groups:

var data = new[] {
          new { Product = "Product 1", Year = 2009, Sales = 1212 },
          new { Product = "Product 2", Year = 2009, Sales = 522 },
          new { Product = "Product 1", Year = 2010, Sales = 1337 },
          new { Product = "Product 2", Year = 2011, Sales = 711 },
          new { Product = "Product 2", Year = 2012, Sales = 2245 },
          new { Product = "Product 3", Year = 2012, Sales = 1000 }
      };
int numberOfProducts = 2;
int numberOfYearsForEachProduct = 3;

var result = data.GroupBy(x => x.Product)
    .OrderByDescending(x => x.Sum(y => y.Sales)) //Order products by their total sum of `Sales`
    .Take(numberOfProducts )
    .SelectMany(x => x.OrderByDescending(y => y.Sales).Take(numberOfYearsForEachProduct)) // Take only the N top years for each product
    .ToList();

I used smaller numbers in the Take so I can see that it is doing it correctly

Upvotes: 1

Hamid Pourjam
Hamid Pourjam

Reputation: 20754

First of all you should get the 20 most sold products

var top20Products = data1
    .GroupBy(x => x.Product)
    .OrderByDescending(group => group.Sum(x => x.Sales))
    .Select(group => group.Key)
    .Take(20);

and then select top 5 most sold years of them

var top5yearsOfTop20products = top20Products
    .SelectMany(product => data1
        .Where(x => x.Product == product)
        .OrderByDescending(x => x.Sales)
        .Take(5));

Upvotes: 1

kashi_rock
kashi_rock

Reputation: 557

If I get you correctly, you want to get top 20 sales for top 5 products.

var ord = data1.OrderByDescending(o => o.Sales)
               .Select(o => o.Product)
               .Distinct().Take(5);//Get top 5 products by their sales

var salesForTopProducts = data1.OrderByDescending(o => o.Sales)
                               .Where(o => ord.Contains(o.Product))
                               .Take(20);//Get top 20 sales for top 5 products

Upvotes: 0

Related Questions