Reputation: 1881
I've been given a task to build a prototype for an app. I don't have any code yet, as the solution concepts that I've come up with seem stinky at best...
The problem:
the solution consist of various Azure projects which do stuff to lots of data stored in Azure SQL db-s. Almost every action that happens creates a gzipped log file in blob storage. So that's one .gz file per log entry.
We should also have a small desktop (WPF) app that should be able to read, filter and sort these log files.
I have absolutely 0 influence on how the logging is done, so this is something that can not be changed to solve this problem.
Possible solutions that I've come up with (conceptually):
1:
The problem with this is that, depending on the filter, this could mean a whole lot of data to download (which is slow), and process (which will also not be very snappy). I really can't see this as a usable application.
2:
With this approach, will I run into problems with decompressing these files if there's a lot of them (will it take up extra space on the storage/compute instance where the service is running).
EDIT: what I mean by filter is limit the results by date and severity (info, warning, error). The .gz files are saved in a structure that makes this quite easy, and I will not be filtering by looking into the files themselves.
3:
I'd also need some way of making the app update the displayed logs in real time, which i suppose would need to be done with repeated requests to the blob storage/service.
This is not one of those "give me code" questions. I am looking for advice on best practices, or similar solutions that worked for similar problems. I also know this could be one of those "no one right answer" questions, as people have different approaches to problems, but I have some time to build a prototype, so I will be trying out different things, and I will select the right answer, which will be the one that showed a solution that worked, or the one that steered me in the right direction, even if it does take some time before I actually build something and test it out.
Upvotes: 1
Views: 1124
Reputation: 24895
Simply storing the blobs isn't sufficient. The metadata you want to filter on should be stored somewhere else where it's easy to filter and retrieve all the metadata. So I think you should split this into 2 problems:
A. How do I efficiently list all "gzips" with their metadata and how can I apply a filter on these gzips in order to show them in my client application.
Solutions
Update: Since you only filter on date and severity you should review the Blob and Table options:
B. How do I display a "gzip" in my application (after double clicking on a search result for example)
Solutions
Upvotes: 1
Reputation: 3693
As I understand it, you have a set of log file in Azure Blob storage that are formatted in a particular way (gzip) and you want to display them.
How big are these files? Are you displaying every single piece of information in the log file?
Assuming that if this is a log file, it is static and historical...meaning that once the log/gzip file is created it cannot be changed (you are not updating the gzip file once it is out on Blog storage). Only new files can be created...
One Solution
Why not create an worker role/job process that periodically goes out and scans the blob storage and builds a persisted "database" so that you can display. Nice thing about this is that you are not putting the unzipping/business logic to extract the log file in a WPF app or UI.
1) I would have the worker role scan the log file in Azure Blob storage 2) Have some kind of mechanism to track which ones where processed and a current "state" maybe the UTC date of the last gzip file 3) Do all the unzipping/extracting of the log file in the worker role 4) Have the worker role place the content in a SQL database, Azure Table Storage or Distributed Cache for access 5) Access can be done by a REST service (ASP.NET Web API/Node.js etc)
You can add more things if you need to scale this out, for example run this as a job to re-do all of the log files from a given time (refresh all). I don't know the size of your data so I am not sure if that is feasable.
Nice thing about this is that if you need to scale your job (overnight), you can spin up 2, 3, 6 worker roles...extract the content, pass the result to a Service Bus or Storage Queue that would insert into SQL, Cache etc for access.
Upvotes: 1