Reputation: 3610
Is it possible to parse all of the "weights" from the two emails below?
I need a RegEx powerful enough to capture only the "weights" from these two emails, and 100's of more emails. The RegEx I'm using now searches for commas and takes the numbers on either side of them, which is perfect for weights in the thousands, but fails to capture weights below one thousand, such as the 954lbs and 800lbs values below.
I have thought maybe I could possibly try to recognize "lbs" and capture the number preceding that, but in some cases the price precedes "lbs".
Any help would be appreciated, thanks guys.
1) Subject: FW: NEFS 11 fish for lease
From: Claire Fitz-Gerald
Date: 11/15/2013 3:02 PM
NEFS 11 has the following fish for lease:
-GOM Cod up to 5,000 lbs (live wt) @ 1.40 lbs
-American Plaice 2,000 lbs .60 lbs or best offer
2) From: Claire Fitz-Gerald
Date: 9/5/2014 9:52 AM
Subject: NEFS 5 Available Fish
All,
NEFS 5 has the following fish available for lease/trade:
GB EAST cod: 954 lbs @ $0.83
GB EAST cod: 1,046 lbs to trade for 1,830 lbs GB WEST cod
GB blackback: 30,000 lbs @ $0.07
GOM blackback: 800 lbs @ $0.03
white hake: 6,322 lbs @ $0.13
pollock: 22,000 lbs @ $0.015
redfish: 14,000 lbs @ $0.015
GB yt: 1,873 lbs @ $1.13
GB yt: 5,127 lbs to trade for 10,254 lbs SNE yt
My relevant code:
with open(file_path, 'r') as f:
pattern = re.compile(r'\d+,\d+ ')
email = f.read()
weights = pattern.findall(email)
data_frame['Weights'].append(weights)
if weights:
print("Weight:", ''.join(weights))
Printout, for email #2: (notice the amounts that are less than 1000 are excluded)
Weight: 1,046 1,830 30,000 6,322 22,000 14,000 1,873 5,127 10,254
Upvotes: 3
Views: 92
Reputation: 43169
A couple of ways, one being
\d[\d,]{2,} lbs
This require a digit, followed by digits, commas a space and lbs literally. See a demo on regex101.com.
Python
:
import re
email = """
2) From: Claire Fitz-Gerald
Date: 9/5/2014 9:52 AM
Subject: NEFS 5 Available Fish
All,
NEFS 5 has the following fish available for lease/trade:
GB EAST cod: 954 lbs @ $0.83
GB EAST cod: 1,046 lbs to trade for 1,830 lbs GB WEST cod
GB blackback: 30,000 lbs @ $0.07
GOM blackback: 800 lbs @ $0.03
white hake: 6,322 lbs @ $0.13
pollock: 22,000 lbs @ $0.015
redfish: 14,000 lbs @ $0.015
GB yt: 1,873 lbs @ $1.13
GB yt: 5,127 lbs to trade for 10,254 lbs SNE yt
"""
rx = re.compile(r'(\d[\d,]{2,}) lbs')
weights = rx.findall(email)
print(weights)
See it working on ideone.com.
Upvotes: 1