Reputation: 63
I need to fetch data for last 1 year which has around 3 million rows for approx. 30000 unique customers. I am looking for a way to see from last 365 days, how many days customer appeared and how many minutes customer has spent ? In other terms, I need to aggregate data for each customer for a year and calculate the average based on how many days customer appeared.
The approach I am looking for:
Get all the row data (Which has around 3 million rows and 10 columns) from the table and load into the memory. Perform LINQ query on it - Can you please help me to figure it out what would be the best approach to use Group By query or how can I perform optimized queries to get the average data for around 30000 customers. I am thinking to divide 3 million rows into two collections (Customers from only north and south vs east and west) while retrieving the data from table. Currently, it takes me around 60 seconds to retrieve all the data from table.
Average = Total minutes customer spent in Last 365 days / Days Customer appeared in Last 365 days
I also thought about performing Aggregations on SQL side and get average data but I think dumping data into memory once and then perform all filters would be much better approach. Any thoughts / suggestions are welcome.
Upvotes: 0
Views: 1289
Reputation: 30464
You wrote:
I want to query from last 365 days, how many days customer appeared and how many minutes customer has spent
Your requirement is ambiguous: If a Customer appears on 5 jan 2019 23:58 and leaves on 6 jan 2019 00:02, how many days does he appear? 2 days?
A database management system (DBMS) is much better suited for large queries than your local process. So if you have the choice, try to let your DBMS do the work, and only transfer the data that you actually plan to use to your local process.
Alas you forgot to show us your classes. From your description it seems that you have a table of Customers
and a table that represent their Appearances
. If you use entity framework, classes will be similar to the following:
class Customer
{
public int Id {get; set;}
... // other properties
// every Customer has zero or more Appearances (one-to-many)
public virtual ICollection<Appearance> Appearances {get; set;}
}
class Appearance
{
public int Id {get; set;}
public DateTime StartTime {get; set;} // Customer appears
public DateTime EndTime {get; set;} // Customer goes away
... // other properties
// every appearance belongs to exactly one Customer, using foreign key
public int CustomerId {get; set;}
public virtual Customer Customer {get; set;}
}
If you use entity framework, your query is easy:
TimeSpan last365Days = TimeSpan.FromDays(365);
DateTime startTime = DateTime.UtcNow-last365Days;
var query = dbContext.Customers
.Where(customer => ...) // only if you don't want all Customers
.Select(customer => new
{
// select only the Customer properties you actually plan to use
Id = Customer.Id,
Name = Customer.Name,
// total time spent in ticks (consider using seconds, minutes, ...)
TimeSpentTicks = customer.Appearances
// keep only appearances in the last 365 days
.Where(appearance.StartTime >= startTime)
// the time spent during this appearance in ticks
.Select(appearance => (appearance.EndTime - appearance.StartTime).Ticks)
// Sum these ticks
.Sum(),
// to calculate the number of days:
// from StartTime and EndTime take the day number of the year
// keep distinct day number
// and count the number of distinct day numbers
NumberOfAppearanceDays = customer.Appearances
.SelectMany(appearance => new
{
appearance.StartTime.DayOfYear,
appearance.EndTime.DayOfYear,
})
.Distinct()
.Count(),
});
If you don't use entity framework, but some other method that can handle IQueryable<...>
you'll have to do the group-join yourself
IQueryable<Customer> customers = ...
IQueryable<Appearance> appearances = ...
.Where(appearance.StartTime >= startTime);
var query = customers.GroupJoin(appearances, // GroupJoin customers and appearances
customer => customer.Id, // from every customer take the Id
appearance => appearance.CustomerId, // from every appearance take the CustomerId
(customer, appearances) => new // from every customer with all his
{ // appearances, make one new object
Id = Customer.Id,
Name = Customer.Name,
TimeSpentTicks = appearances
.Select(appearance => (appearance.EndTime - appearance.StartTime).Ticks)
.Sum(),
NumberOfAppearanceDays = appearances
.SelectMany(appearance => new
{
appearance.StartTime.DayOfYear,
appearance.EndTime.DayOfYear,
})
.Distinct()
.Count(),
});
Wait! you're not finished yet! You've got the time spent in Tick, you'll have to convert them to TimeSpans
. Normally you would use TimeSpan.FromTicks(...)
for this, but SQL does not know this. Now that you've limited the data to the data you actually plan to use
you can move the selected data to your local process and then use TimeSpan.FromTicks(...)
Continuing the Query:
.AsEnumerable()
.Select(fetchedCustomerInfo => new
{
Id = fetchedCustomerInfo.Id,
...
TimeSpent = TimeSpent.FromTicks(fetchedCustomerInfo.TimeSpentTicks),
NumberOfAppearanceDays = fetchedCustomerInfo.NumberOfAppearanceDays,
});
It might be that you can't even handle IQueryable<...>
, in that case you'll have to use Dapper and SQL to do the GroupJoin. Search on StackOverflow on how to do a GroupJoin. You should be able to compose the SQL statement from the code above
Upvotes: 2