Reputation: 175
I have a script that is reading emails and pulling information out of the email and saving it to my sql database. However, it does not insert any information pertaining to phone numbers.
This is my code to determine the Phone Number:
if (preg_match('|^<b>Phone(.*)>\s*(\S*)<?|U', $lines[$i], $matches)) {
$phone = trim($matches[2]);
}
An example email would be like this:
Name: Joe Schmoe
E-mail Address: [email protected]
Phone: 555-555-5555
Here is a Source Sample of what the Email provides:
<b>Phone:</b> 555-555-5555</font><br> –
It seems the $phone
variable ends up being empty or null as it is isn't being inserted in the database but all my other information is..
Any suggestions on this matter?
Upvotes: 0
Views: 154
Reputation: 34395
Here is a cleaned up regex that should do the trick for you. It allows digit sequences to be optionally separated by either spaces or hyphens:
$re = '% # Rev:20111101
# Match phone number after "phone:</br>".
phone: # Literal text: "phone:".
\s* # Optional (zero or more) whitespace.
</br> # Literal text: "</br>".
\s* # Optional whitespace.
( # Capture group $1:
[0-9]+ # {normal+} One or more digits.
(?: # Group for optional digit separators.
[ -] # {special} Digit separator.
[0-9]+ # {normal+} More one or more digits.
)* # End {(special normal+)*} construct.
) # End $1: Phone number.
\s* # Optional whitespace.
< # Ensure number followed by literal "<".
%ix'; // Use 'x'-free-spacing and 'i'-case-insensitive mode.
if (preg_match($re, $lines[$i], $matches)) {
$phone = $matches[1];
}
U
ungreedy modifier!Using the U
ungreedy modifier is NOT best practices - it should always be avoided. When you need to make an individual quantifier lazy, just add the ?
modifier to the specific quantifier. Note that using the U
mode modifier is never needed or warranted - all it does is serve to confuse the reader.
Edit 2011-11-01 3:14pm MDT "Broke down" regex by rewriting it in free-spacing mode and added lots-o-comments.
Upvotes: 1
Reputation: 10067
I would try something more reliable without html tags involved
|\bPhone:\s+(\S*)|
Upvotes: 0
Reputation: 360662
Your first (.*)
is matching in greedy mode - you'll probably find that ALL of the text in the string from Phone
onwards to the last >
in the string has been slurped up by that capture group and is in $matches[1]
.
Does the <b>
in the pattern indicate you're working on an HTML string? You shouldn't use regexes on HTML, as they can/will blow up on you. Use DOM instead to find the phone number node, and then extract the node's text content. YOu can then use a simple substring expression to split the phone number text into Phone:
and 555-555-5555
.
Upvotes: 0