liv2hak
liv2hak

Reputation: 14970

Filtering and extracting fields from a text file in python

I have a text file in the following format.

03/12 20:23:26.11: 04:23:26 L9 <Mx  Acc  Magnum All            XDV:00111A0000000117 00D3001200870172 01FF6000F01CFE81 3D26000000000300
03/12 20:23:26.11: 04:23:26 L9 <Mx  Acc  MID 0x1500 Len 26   XDV:00111A0000000117 00D3001200870172 01FF6000F01CFE81 3D26000000000300
03/12 20:23:26.11: 04:23:26 L8 <Mx  JK31 (Mx)                  JSP:17.37.6.99: Size = 166, Data: 00345C4101003031 E463EF0113108701 5A01FF6008F01CFE 81AB170000000003 EF01131087015A01 FF6008F01CFE81AB 170000000003EF01 131087015B01FF60 00F01CFE81701B00 00000003EF011310 87015B01FF6000F0 1CFE81701B000000 0003EF0113108701 5C01FF2000F01CFE 81CB240000000003 EF01131087015C01 57CC00F01CFE81CB 240000000003EF01 131087015D01FF20 00F01CFE815B2900 00000003EF011310 87015D01FF2000F0 1CFE815B29000000 0003EF0113108701 5E01FF6000F01CFE 819D280000000003 EF01131087015E01 FF6000F01CFE819D 0003
03/15 20:23:26.11: 04:23:26 L8 <Kx  JK49 (Kx)                  JSP:15.33.2.93: Size = 163, Data: 00647741000030EF 01131087015A01FF 6008F01CFE81AB17 0000000003EF0113 1087015A01FF6008 F01CFE81AB170000 000003EF01131087 015B01FF6000F01C FE81701B00000000 03EF01131087015B 01FF6000F01CFE81 701B0000000003EF 01131087015C01FF 2000F01CFE81CB24 0000000003EF0113 1087015C01FF2000 F01CFE81CB240000 000003EF01131087 015D01FF2000F01C FE815B2900000000 03EF01131087015D 01FF2000F01CFE81 5B290000000003EF 01131087015E01FF 6000F01CFE819D28 0000000003EF0113 1087015E01FF6000 F01CFE819D280000 A6220000000003
03/15 20:23:26.11: 04:23:26 L8 <Kx  JK21 (Kx)                  JSP:10.22.1.53:Size = 163, Data: 009D1141000030EF 01131087015A01FF 6008F01CFE81AB17 0000000003EF0113 1087015A01FF6008 F01CFE81AB170000 000003EF01131087 015B01FF6000F01C FE81701B00000000 03EF01131087015B 01FF6000F01CFE81 701B0000000003EF 01131087015C01FF 2000F01CFE81CB24 0000000003EF0113 1087015C01FF2000 F01CFE81CB240000 000003EF01131087 015D01FF2000F01C FE815B2900000000 03EF01131087015D 01FF2000F01CFE81 5B290000000003EF 01131087015E01FF 6000F01CFE819D28 0000000003EF0113 1087015E01FF6000 F01CFE819D280000 A6220000000003

I want to extract line by line of the file apply filter on it.For example I want to extract all lines that have L8 <Mx JK31 (Mx) in it, extract the time (04:23:26) and the size (166) and plot a graph of size over time. I want to do this in python.

# !/usr/bin/env python
# -*- coding: utf-8 -*-

match = ("L8 <Mx JK31 (Mx)")

with open("test.txt") as fin:
    print(' : {}', fin.name)
    for line in fin:
        if match in line:
            print(line)

I am able to extract all the lines with the expected text.(if match in line:).How do I extract the time field and the Size field in python?

Upvotes: 0

Views: 1352

Answers (3)

Spade
Spade

Reputation: 2280

Extending your approach without using other modules, the following solution can work:

match = "L8 <Mx  JK31 (Mx)"

with open("test.txt") as fin:
    print(' : {}', fin.name)
    for line in fin:
        if match in line:
            print(line)
            sizeStart = line.find("Size = ")
            sizeEnd = line[sizeStart:].find(',')
            size =  line[sizeStart+len("Size = "):sizeStart+sizeEnd]


            time1_start = line.find(" ")
            time1_end = line[time1_start+1:].find(" ")
            time1 = line[time1_start+1:time1_start+time1_end]

            print size, time1

Similarly, you can get time2. I minimize my reliance on the re module because of the cryptic syntax I have to get used to. What is more readable is worthy of argument.

Upvotes: 0

Cyb3rFly3r
Cyb3rFly3r

Reputation: 1341

You could also use regular expressions which allows you to do more precise matching:

m = re.search(r':\s(\d\d:\d\d:\d\d) L8 \<Mx\s+JK31 \(Mx\).*Size = (\d+),', line)
if m:
    # found match
    print('Time: {}'.format(m.group(1)))
    print('Size: {}'.format(m.group(2)))
# else:
    # pattern was not found: handle it or error

Upvotes: 1

Hackaholic
Hackaholic

Reputation: 19733

you extract time and size like this:

# !/usr/bin/env python
# -*- coding: utf-8 -*-

match = ("L8 <Mx JK31 (Mx)")
with open("test.txt") as fin:
    print(' : {}', fin.name)
    for line in fin:
        if match in line:
            line = line.strip.split()
            time = line[2]
            size = line[9].strip(",")

Upvotes: 1

Related Questions