Damian
Damian

Reputation: 21

Using re and matching, how can I search and get certain data from a text file?

I have used re and matching to extract certain data from a text file. But I have issues trying to get specific data using similar technique. Keep getting stuck. So posting the code I used to get the lines I require. Details are at the end of the code below. Thank you in advance!

Data from text file:

-------------------------------------------------------------------------------------------------------------------------------------
   1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30  1  2  3  4  5  6  7
  SU MO TU WE TH FR SA SU MO TU WE TH FR SA SU MO TU WE TH FR SA SU MO TU WE TH FR SA SU MO TU WE TH FR SA SU MO
 121               192                  175               158                  168                              BLK NO.  101 DYS OFF  17
 CVGORD X  X  X  X AVPDSMORD X  X  X  X GRBDSMORD X  X  X PIALEXORD X  X  X  X CHALEXORD X  X                   CRD.   72.00 BLK.  58.31
  121= 0910/1255/0901; 192= 0810/1915/1536; 175= 0750/1218/0931; 158= 0730/1240/1359; 168= 0758/1239/1638;      TAFB  245.09 C/O    0.0

Code: sorry forgot to add myDict[key] from my code EDITED

with open(filename, 'r') as f:
    count = 0
    for line in f:
        matchObj = re.match(dashes1, line)
        if matchObj:
            count += 1
            strcount =str(count)
            data = ['','','','']
            f.readline()
            f.readline()
            data[0] = f.readline()
            data[1] = f.readline()
            key = "myData"+strcount
            myDict[key] = data
f.close()   



for key in myDict:
    print(key, '->', myDict[key])

My output is:

myData1 -> [' 121               192                  175               158                  168                              BLK NO.  101 DYS OFF  17\n', ' CVGORD X  X  X  X AVPDSMORD X  X  X  X GRBDSMORD X  X  X PIALEXORD X  X  X  X CHALEXORD X  X                   CRD.   72.00 BLK.  58.31\n', '', '']

I want to get the data after BLK NO. that is 101, data after DYS OFF which is 17, and so on for CRD. value of 72.00 and BLK. value of 58.31.

I don't want to print BLK NO., DYS OFF, CRD. nor BLK. just the values after them. I have tried the same method using re and matching but I get stuck. Thank you for the help in advance!

Upvotes: 1

Views: 54

Answers (1)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521249

I would keep things sane and simple, and just use re.findall here, after reading the entire content into a string:

inp = """-------------------------------------------------------------------------------------------------------------------------------------
1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30  1  2  3  4  5  6  7
SU MO TU WE TH FR SA SU MO TU WE TH FR SA SU MO TU WE TH FR SA SU MO TU WE TH FR SA SU MO TU WE TH FR SA SU MO
121               192                  175               158                  168                              BLK NO.  101 DYS OFF  17
CVGORD X  X  X  X AVPDSMORD X  X  X  X GRBDSMORD X  X  X PIALEXORD X  X  X  X CHALEXORD X  X                   CRD.   72.00 BLK.  58.31
121= 0910/1255/0901; 192= 0810/1915/1536; 175= 0750/1218/0931; 158= 0730/1240/1359; 168= 0758/1239/1638;      TAFB  245.09 C/O    0.0"""

keys = ["BLK NO\.", "DYS OFF", "CRD\.", "BLK\.", "TAFB", "C/O"]
regex = "(" + "|".join(keys) + ")"
matches = re.findall(regex + r'\s+(\d+(?:\.\d+)?)', inp)
print(matches)

This prints:

[('BLK NO.', '101'), ('DYS OFF', '17'), ('CRD.', '72.00'), ('BLK.', '58.31'),
 ('TAFB', '245.09'), ('C/O', '0.0')]

Upvotes: 1

Related Questions