Reputation: 503
For a successful request, log file has been written as below
o 123456789.000 10.10.10.10 3 30 10 -
n A-123456 1452830400 1 1452
n C-73652 1452830400 1 23154
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 92673
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd
Line beginning with o
and has some data
(as shown above) and ends with another line beginning with o
. This is the pattern.
If there are multiple such requests, log file keeps on appending as shown below
o 123456789.000 10.10.10.10 3 30 10 -
n A-123456 1452830400 1 1452
n C-73652 1452830400 1 23154
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 92673
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd
o 123456789.000 10.10.10.10 3 30 10 -
n A-123456 1452830400 1 1452
n C-73652 1452830400 1 23154
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 92673
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd
There might be some wrong generated data in the file, that can be for any earlier requests or latest requests.
If wrong-data generated is not for latest request, can be ignored.
Example :
o 123456789.000 10.10.10.10 3 30 10 -
n A-123456 1452830400 1 1452
n C-73652 1452830400 1 23154
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 92673
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd
o 123456789.000 10.10.10.10 3 30 10 -
n A-123456 1452830400 1 1452
n C-73652 1452830400 1 23154
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 92673
# Should be indication of request i.e., line beginning with o, followed some data
o 123456789.000 10.10.10.10 3 30 10 -
n A-123456 1452830400 1 1452
n C-73652 1452830400 1 23154
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 92673
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd
If wrong-data generated is for latest request, it should be caught and highlighted.
Example :
o 123456789.000 10.10.10.10 3 30 10 -
n A-123456 1452830400 1 1452
n C-73652 1452830400 1 23154
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 92673
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd
o 123456789.000 10.10.10.10 3 30 10 -
n A-123456 1452830400 1 1452
n C-73652 1452830400 1 23154
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 92673
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd
o 123456789.000 10.10.10.10 3 30 10 -
n A-123456 1452830400 1 1452
n C-73652 1452830400 1 23154
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 92673
# No line present i,e., (o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd)
Missing line can be first line beginning with o
or last line beginning with o
I need to check, for every request logs are written in this format and how many successful and un-successful requests are captured in a file ?
Approach 1: can read the file contents and then parse as if line beigns with o etc, which i don't feel feasible
Approach 2 : I feel, reg-ex is optimum and best solution.
Which would be best ? and could you please help me to achieve it ?
Tried so far:
reg_ex1 = "o\s+\d+(\.\d+)?\d+\s+\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\s+\d+\s+\d+\s+\d+\s+-"
reg_ex2 = "o\s+\d+(\.\d+)?\d+\s+\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\s+\d+\s+\d+\s+\d+\s+[a-zA-Z0-9_]+"
with open(""some_file.log, 'r') as content_file:
content = content_file.read()
pattern1 = re.compile(reg_ex1)
begin_lines = len(pattern1.findall(content))
pattern2 = re.compile(reg_ex2)
end_lines = len(pattern2.findall(content))
if begin_lines == end_lines:
print "File has successful requests captured"
else:
print "File has un-successful requests captured"
# If wrong-data generated is not for latest request, can be ignored.
# If wrong-data generated is for latest request, it should be caught and highlighted.
May be not a good idea though, please let me know.
UPD:
o 123456789.000 10.10.10.10 3 30 10 001-
n A-123456===123 1452830400 1 14521
n C-73652 1452830400 1 231541
n B-967845 1452830400 1 37451
n G-809573==123 1452830400 1 926731
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd
o 123456789.000 10.10.10.10 3 30 10 002-
n A-123456===456 1452830400 1 14522
n C-73652 1452830400 1 231542
n B-967845 1452830400 1 37452
n G-809573===456 1452830400 1 926732
o 123456789.000 10.10.10.10 3 30 10 003-
n A-123456===789 1452830400 1 14523
n C-73652 1452830400 1 231543
n B-967845 1452830400 1 374513
n G-809573===789 1452830400 1 926733
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd
For the above text, I would want to extract Packet 1 and 3.
Upvotes: 1
Views: 179
Reputation: 16772
To check if the file is good
or bad
, We'd play with the first and last line of the file, Considering;
o
o
o
list.txt:
o 123456789.000 10.10.10.10 3 30 10 -
n A-123456 1452830400 1 1452
n C-73652 1452830400 1 23154
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 92673
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd
o 123456789.000 10.10.10.10 3 30 10 -
n A-123456 1452830400 1 1452
n C-73652 1452830400 1 23154
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 92673
# Should be indication of request i.e., line beginning with o, followed some data
o 123456789.000 10.10.10.10 3 30 10 -
n A-123456 1452830400 1 1452
n C-73652 1452830400 1 23154
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 92673
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfd
Hence:
logFile = "list.txt"
with open(logFile) as f:
content = f.readlines()
# you may also want to remove empty lines
content = [l.strip() for l in content if l.strip()]
for line in content:
if line.startswith("o"): # check if the first line starts with o
if str(content[-1]).strip("[']").split()[0] == 'o': # check if last line starts with o
print("File is good.")
else:
print("File is bad.")
break
else: # end if the first line does not start with o
print("File is bad.")
break
EDIT:
To get all the responses between valid pair of o
's:
list.txt:
o 123456789.000 10.10.10.10 3 30 10 001-
n A-123456 1452830400 1 14521
n C-73652 1452830400 1 231541
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 926731
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd
o 123456789.000 10.10.10.10 3 30 10 002-
n A-123456 1452830400 1 14522
n C-73652 1452830400 1 231542
n B-967845 1452830400 1 37452
n G-809573 1452830400 1 926732
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd
o 123456789.000 10.10.10.10 3 30 10 003-
n A-123456 1452830400 1 14523
n C-73652 1452830400 1 231543
n B-967845 1452830400 1 374513
n G-809573 1452830400 1 926733
Hence:
import re
def GetTheResponses(infile):
with open(infile) as fp:
red = fp.read()
for result in re.findall('o (.*?)o ', red, re.S):
print(result)
GetTheResponses('list.txt')
OUTPUT:
123456789.000 10.10.10.10 3 30 10 001-
n A-123456 1452830400 1 14521
n C-73652 1452830400 1 231541
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 926731
123456789.000 10.10.10.10 3 30 10 002-
n A-123456 1452830400 1 14522
n C-73652 1452830400 1 231542
n B-967845 1452830400 1 37452
n G-809573 1452830400 1 926732
EDIT 2: (for better readability):
count = 1
for result in re.findall('o (.*?)o ', red, re.S):
print("Response Packet: {}".format(count))
print("\n".join(result.split("\n")[1:]))
count +=1
OUTPUT:
Response Packet: 1
n A-123456 1452830400 1 14521
n C-73652 1452830400 1 231541
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 926731
Response Packet: 2
n A-123456 1452830400 1 14522
n C-73652 1452830400 1 231542
n B-967845 1452830400 1 37452
n G-809573 1452830400 1 926732
Upvotes: 2
Reputation: 328
^n\s.+[\n\r]+o\s.+[\n\r]+n\s.+|^n\s.+[\n\r]+n\s.+[\n\r]+n\s.+[\n\r]+n\s.+[\n\r]+(?!o)|^o\s.+[\n\r]+o\s.+[\n\r]+o\s.+
Upvotes: 0
Reputation: 1739
I would definetely recomment Approach1, Why..? in this way, we have flexibilty to read/iterate each line specifically..
with open('file.txt', r) as fp:
line = fp.readline()
print(type(line)) #string
#do anything with line(string)
#1. split_list= fp.split() -- list of values separated by space
#2. Check type of each element:
# split_list[0].isalpha(),
# split_list[0].isalpha(),
# split_list[0].isdigit(),
# split_list[0].isspace() like so, and then do required adding to final dict/list..
Make sure to try: catch, every step.. as log files are unpredictable.
Upvotes: 0