Reputation: 5450
I want to know what the most efficient way is to parse a text file. For example, lets say I have the following text file:
Number of connections server is: 1
Server status is: ACTIVE
Number of connections to server is: 4
Server status is: ACTIVE
Server is not responding: 13:25:03
Server connection is established: 13:27:05
What I want to do is to go through the file and gather information. For example, number of connections to the server, or the times the server went down. I want to save these values in maybe lists, so that I can view or plot them later.
So what is the best way to perform this, assuming I have my keywords in a list as follows:
referenceLines = ['connections server', 'Server status', 'not responding']
Note that I do not have the complete sentence in the list, but only a part of it. I want to go through the file, line-by-line, and check if the read line corresponds to any entry in the referenceLines list, if so, get the index of the list entry and call the corresponding function.
What would be the most efficient (time, memory) way to do this, as a typical text file will be about 50MB in size.
Thank you.
Any
Upvotes: 1
Views: 3128
Reputation: 25197
Here's one possible approach. It uses a regular expression pattern of the form 'keyword1|keyword2'
to search for multiple keywords at once.
def func1(line):
#do something
def func2(line):
#do something
actions = {'connections server': func1,
'Server status': func2}
regex = re.compile('|'.join(re.escape(key) for key in actions))
for line in file:
for matchobj in regex.finditer(line):
actions[matchobj.group()](line)
Upvotes: 1
Reputation: 1235
If the text file you want to parse always contains the same fields in the same order, then mikerobi's solution is good. Otherwise, you need to iterate through the lines and try detecting referenceLines...
Upvotes: 1
Reputation: 8491
As a practical approach, I suggest that you implement this in a series of steps while measuring the performance at each step to gauge the cost of the approach you are using with your test data.
For example:
The optimal solution will depend on your data, for example, how many reference lines your are using, but it should only take a few seconds on a modern machine
Upvotes: 1
Reputation: 20878
If every line is seperated by ": ", you can split the string.
message, value = line.split(': ', 1)
Upvotes: 4