Matt
Matt

Reputation: 3852

Extract data using regular expressions in python

I'm trying to parse a file with Serial numbers and part numbers etc and sort them into a structure. I would like to parse this file by tagging off of the identifiers but then I only really need the actual numbers/codes for my data structure. I need to assume that all the numbers/codes are of varied length however I can depend on the identifiers to precede the numbers/codes and also depend on the end line after each value.

//Text file with serials and information
Serial: 523524234235
Part Number: MHC-1251-A
Manufacturer: KNL-ETA

Serial: 523524281238
Part Number: QLC-851
Manufacturer: MHQ-MCE

.
.
.

Upvotes: 1

Views: 674

Answers (3)

Ashwinee K Jha
Ashwinee K Jha

Reputation: 9317

On each line you can apply regular expressions to extract desired part like this:

>>> import re
>>> text = "Serial: 523524234235"
>>> m = re.search(r'Serial: (\d+)', text)
>>> m.group(1)
'523524234235'

You can also use split to get two parts in each line and then check first part to see what kind of token it is Serial, Part Number etc.

your regular expression needs some improvement.

m = re.search(r'Serial: (\d+)', text) ==> ` m = re.search(r'Serial:[\s]*(\d+)[\s]*', text)`

Upvotes: 3

J. Katzwinkel
J. Katzwinkel

Reputation: 1953

I agree with @loki; from what you are telling, the use of regex is not necessary. An appropriate structure extracted from a file like yours might be set up like:

parts={} # data structure
entry={} # single set
for line in open('file.dat', 'r'):
  flds = [fld.strip() for fld in line.split(':')[:2]]
  if len(flds) > 1:
    k,v = flds
    if k == 'Serial': # use serial number as key vor corresponding entry
      entry = {}
      parts[v] = entry
    else:
      entry[k] = v # save information in data set

Result:

{'523524234235': {'Part Number': 'MHC-1251-A', 'Manufacturer': 'KNL-ETA'}, '523524281238': {'Part Number': 'QLC-851', 'Manufacturer': 'MHQ-MCE'}, ...}

Upvotes: 1

loki
loki

Reputation: 387

open the file and readlines and iterate and split by ':' to get your numbers. You can use regex if values are not line by line.

Upvotes: 2

Related Questions