Reputation: 3852
I'm trying to parse a file with Serial numbers and part numbers etc and sort them into a structure. I would like to parse this file by tagging off of the identifiers but then I only really need the actual numbers/codes for my data structure. I need to assume that all the numbers/codes are of varied length however I can depend on the identifiers to precede the numbers/codes and also depend on the end line after each value.
//Text file with serials and information
Serial: 523524234235
Part Number: MHC-1251-A
Manufacturer: KNL-ETA
Serial: 523524281238
Part Number: QLC-851
Manufacturer: MHQ-MCE
.
.
.
Upvotes: 1
Views: 674
Reputation: 9317
On each line you can apply regular expressions to extract desired part like this:
>>> import re
>>> text = "Serial: 523524234235"
>>> m = re.search(r'Serial: (\d+)', text)
>>> m.group(1)
'523524234235'
You can also use split to get two parts in each line and then check first part to see what kind of token it is Serial, Part Number etc.
your regular expression needs some improvement.
m = re.search(r'Serial: (\d+)', text) ==> ` m = re.search(r'Serial:[\s]*(\d+)[\s]*', text)`
Upvotes: 3
Reputation: 1953
I agree with @loki; from what you are telling, the use of regex is not necessary. An appropriate structure extracted from a file like yours might be set up like:
parts={} # data structure
entry={} # single set
for line in open('file.dat', 'r'):
flds = [fld.strip() for fld in line.split(':')[:2]]
if len(flds) > 1:
k,v = flds
if k == 'Serial': # use serial number as key vor corresponding entry
entry = {}
parts[v] = entry
else:
entry[k] = v # save information in data set
Result:
{'523524234235': {'Part Number': 'MHC-1251-A', 'Manufacturer': 'KNL-ETA'}, '523524281238': {'Part Number': 'QLC-851', 'Manufacturer': 'MHQ-MCE'}, ...}
Upvotes: 1
Reputation: 387
open the file and readlines and iterate and split by ':' to get your numbers. You can use regex if values are not line by line.
Upvotes: 2