theprowler
theprowler

Reputation: 3610

Parsing different numbers with RegEx

Is it possible to parse all of the "weights" from the two emails below?

I need a RegEx powerful enough to capture only the "weights" from these two emails, and 100's of more emails. The RegEx I'm using now searches for commas and takes the numbers on either side of them, which is perfect for weights in the thousands, but fails to capture weights below one thousand, such as the 954lbs and 800lbs values below.

I have thought maybe I could possibly try to recognize "lbs" and capture the number preceding that, but in some cases the price precedes "lbs".

Any help would be appreciated, thanks guys.

1) Subject: FW: NEFS 11 fish for lease
   From: Claire Fitz-Gerald 
   Date: 11/15/2013 3:02 PM

   NEFS 11 has the following fish for lease:

   -GOM Cod up to 5,000 lbs (live wt) @ 1.40 lbs
   -American Plaice 2,000 lbs      .60 lbs or best offer



2) From: Claire Fitz-Gerald 
   Date: 9/5/2014 9:52 AM
   Subject: NEFS 5 Available Fish

   All,
   NEFS 5 has the following fish available for lease/trade:

     GB EAST cod: 954 lbs @ $0.83
     GB EAST cod: 1,046 lbs to trade for 1,830 lbs GB WEST cod
     GB blackback: 30,000 lbs @ $0.07
     GOM blackback: 800 lbs @ $0.03
     white hake: 6,322 lbs @ $0.13
     pollock: 22,000 lbs @ $0.015
     redfish: 14,000 lbs @ $0.015
     GB yt: 1,873 lbs @ $1.13
     GB yt: 5,127 lbs to trade for 10,254 lbs SNE yt

My relevant code:

with open(file_path, 'r') as f:
            pattern = re.compile(r'\d+,\d+ ')
            email = f.read()
            weights = pattern.findall(email)
            data_frame['Weights'].append(weights)
            if weights:
                print("Weight:", ''.join(weights))

Printout, for email #2: (notice the amounts that are less than 1000 are excluded)

Weight: 1,046 1,830 30,000 6,322 22,000 14,000 1,873 5,127 10,254 

Upvotes: 3

Views: 92

Answers (1)

Jan
Jan

Reputation: 43169

A couple of ways, one being

\d[\d,]{2,} lbs

This require a digit, followed by digits, commas a space and lbs literally. See a demo on regex101.com.


In full Python:

import re

email = """
2) From: Claire Fitz-Gerald 
   Date: 9/5/2014 9:52 AM
   Subject: NEFS 5 Available Fish

   All,
   NEFS 5 has the following fish available for lease/trade:

     GB EAST cod: 954 lbs @ $0.83
     GB EAST cod: 1,046 lbs to trade for 1,830 lbs GB WEST cod
     GB blackback: 30,000 lbs @ $0.07
     GOM blackback: 800 lbs @ $0.03
     white hake: 6,322 lbs @ $0.13
     pollock: 22,000 lbs @ $0.015
     redfish: 14,000 lbs @ $0.015
     GB yt: 1,873 lbs @ $1.13
     GB yt: 5,127 lbs to trade for 10,254 lbs SNE yt
"""

rx = re.compile(r'(\d[\d,]{2,}) lbs')
weights = rx.findall(email)
print(weights)

See it working on ideone.com.

Upvotes: 1

Related Questions