Reputation: 21
I have used re and matching to extract certain data from a text file. But I have issues trying to get specific data using similar technique. Keep getting stuck. So posting the code I used to get the lines I require. Details are at the end of the code below. Thank you in advance!
Data from text file:
-------------------------------------------------------------------------------------------------------------------------------------
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 6 7
SU MO TU WE TH FR SA SU MO TU WE TH FR SA SU MO TU WE TH FR SA SU MO TU WE TH FR SA SU MO TU WE TH FR SA SU MO
121 192 175 158 168 BLK NO. 101 DYS OFF 17
CVGORD X X X X AVPDSMORD X X X X GRBDSMORD X X X PIALEXORD X X X X CHALEXORD X X CRD. 72.00 BLK. 58.31
121= 0910/1255/0901; 192= 0810/1915/1536; 175= 0750/1218/0931; 158= 0730/1240/1359; 168= 0758/1239/1638; TAFB 245.09 C/O 0.0
Code: sorry forgot to add myDict[key] from my code EDITED
with open(filename, 'r') as f:
count = 0
for line in f:
matchObj = re.match(dashes1, line)
if matchObj:
count += 1
strcount =str(count)
data = ['','','','']
f.readline()
f.readline()
data[0] = f.readline()
data[1] = f.readline()
key = "myData"+strcount
myDict[key] = data
f.close()
for key in myDict:
print(key, '->', myDict[key])
My output is:
myData1 -> [' 121 192 175 158 168 BLK NO. 101 DYS OFF 17\n', ' CVGORD X X X X AVPDSMORD X X X X GRBDSMORD X X X PIALEXORD X X X X CHALEXORD X X CRD. 72.00 BLK. 58.31\n', '', '']
I want to get the data after BLK NO. that is 101, data after DYS OFF which is 17, and so on for CRD. value of 72.00 and BLK. value of 58.31.
I don't want to print BLK NO., DYS OFF, CRD. nor BLK. just the values after them. I have tried the same method using re and matching but I get stuck. Thank you for the help in advance!
Upvotes: 1
Views: 54
Reputation: 521249
I would keep things sane and simple, and just use re.findall
here, after reading the entire content into a string:
inp = """-------------------------------------------------------------------------------------------------------------------------------------
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 6 7
SU MO TU WE TH FR SA SU MO TU WE TH FR SA SU MO TU WE TH FR SA SU MO TU WE TH FR SA SU MO TU WE TH FR SA SU MO
121 192 175 158 168 BLK NO. 101 DYS OFF 17
CVGORD X X X X AVPDSMORD X X X X GRBDSMORD X X X PIALEXORD X X X X CHALEXORD X X CRD. 72.00 BLK. 58.31
121= 0910/1255/0901; 192= 0810/1915/1536; 175= 0750/1218/0931; 158= 0730/1240/1359; 168= 0758/1239/1638; TAFB 245.09 C/O 0.0"""
keys = ["BLK NO\.", "DYS OFF", "CRD\.", "BLK\.", "TAFB", "C/O"]
regex = "(" + "|".join(keys) + ")"
matches = re.findall(regex + r'\s+(\d+(?:\.\d+)?)', inp)
print(matches)
This prints:
[('BLK NO.', '101'), ('DYS OFF', '17'), ('CRD.', '72.00'), ('BLK.', '58.31'),
('TAFB', '245.09'), ('C/O', '0.0')]
Upvotes: 1