Martijn B
Martijn B

Reputation: 4075

Howto read this large text file? Memory Mapped File?

I am in the design phase of a simple tool I want to write where I need to read large log files. To give you guys some context I will first explain you something about it.

The log files I need to read consists of log entries which always consist of the following 3-line format:

statistics : <some data which is more of less of the same length about 100 chars>
request :  <some xml string which can be small (10KB) or big (25MB) and anything in between>
response :  <ditto>

The log files can be about 100-600MB of size which means a lot of log entries. Now these log entries can have a relation with each other, for this I need to start reading the file from the end to the beginning. These relationship can be deduced from the statistics line.

I want to use the info in the statistics line to build up some datagrid which the users can use to search through the data and do some filtering operations. Now I don't want to load the request / response lines into memory until the user actually needs it. In addition I want to keep the memory load small by limiting the maximum of loaded request/response entries.

So I think I need to save the offsets of the statistics line when I am parsing the file for the first time and creating a index of statistics. Then when the user clicks on some statistic which is a element of a log entry then I read the request / response from the file by using this offset. I can then hold it some memory pool which takes care that there are not to much loaded request / response entries (see earlier req).

The problem is that I don't know how often the user is going to need the request/response data. It could be a lot it could be a few times. In addition the log file could be loaded from a network share.

The question I have is:

  1. Is this a scenario when you should use a memory mapped file because of the fact there could be a lot of read operations? Or is it better to use a plain filestream. BTW. I don't need write operations to the log file at this stage but it could be in the future!

If you have other tips or see flaws in my thinking so far please let me know as well. I am open for any approach.

Update:

To clarify some more:

Upvotes: 0

Views: 1538

Answers (3)

paparazzo
paparazzo

Reputation: 45096

I don't disagree with with answer from walther. I would go db or all memory.

Why are you so concerned about saving memory as 600 MB is not that much. Are you going to be running on machines with less than 2 GB of memory?

Load into a dictionary with statistics as a key and the value a class with two properties - request and response. Dictionary is fast. LINQ is powerful and fast.

Upvotes: 0

cwa
cwa

Reputation: 1142

If you're sending the request/response chunk over the network, the network send() time is likely to be so much greater than the difference between seek()/read() and using memmap that it won't matter. To really make this scale, a simple solution is to just breakup the file into many files, one for each chunk you want to serve (since the "request" can be up to 25 MB). Then your HTTP server will send that chunk as effeciently as possible (perhaps even using zerocopy, depending on your webserver). If you have many small "request" chunks, and only a few giant ones, you could break-out only the ones past a certain threshold.

Upvotes: 1

walther
walther

Reputation: 13600

You're talking about some stored data that has some defined relationships between actual entries... Maybe it's just me, but this scenario just calls for some kind of a relational database. I'd suggest to consider some portable db, like SQL Server CE for instance. It'll make your life much easier and provide exactly the functionality you need. If you use db instead, you can query exactly the data you need, without ever needing to handle large files like this.

Upvotes: 1

Related Questions