SpYk3HH
SpYk3HH

Reputation: 22580

Need help writing regex with varying multiple lines and pull SUBSTRING in MySQL

I'm trying to pull data of Shipping info from a string containing line breaks. The shipping info maybe between 3 and 5 lines. Here's a few examples:

    /* Just need from here */
Ship NEXT DAY PM to:
John Doe
Address Street
Sweet, NC 55555
    /* to here */
Email: [email protected]
Phone: 555-555-555

Ship NEXT DAY AM to:
John Doe
Address Street
Apt. 555
Sweet, NC 55555

Email: [email protected]
Phone: 555-555-555

Ship NEXT DAY PM to:
John Doe
c/o Jane Doe
Address Street
Apt. 555
Sweet, NC 55555

Email: [email protected]
Phone: 555-555-555

UPDATE

I finally figured out a solution for the regex. Also one was mentioned in comments that seems to work well. Not sure which would be better based on any "why" but here they are:

Ship (.)+ to\:((\n)(.)+){1,5}

Ship (.)+ to\:[\s\S]*?(?=\nEmail)

Now I've got a whole new issue. Turns out, MySQL REGEXP does not function as I expected (which I would have known had I bothered to read rather than skim API). So Now I'm trying to pull what I just regexed out of a substring IF it exist. The IF part I got, I simply use the REGEX to check if Ship To is found and if not NULL then start on substring extraction. The Substring extraction is where I'm stuck now as it's proving rather tedious in using LOCATE in multiple combinations to try and get exact character position. Anyone know a quick and easy way? I'll post what I'm working on as soon as I don't get an ERROR, lol.

So far working the following into If statement:

MID(`Notes`, (LOCATE(' to:\n', `Notes`) + 5), (LOCATE('\n\nEmail:', `Notes`) - LOCATE(' to:\n', `Notes`) - 5)) as ShipAddress

Upvotes: 2

Views: 373

Answers (2)

SpYk3HH
SpYk3HH

Reputation: 22580

My Final Solution:

While the suggestion of the extra lib was probably a good way to go, I'd rather just go with what's available. In the end I made use of the Regex and a simple CASE Statement as follows:

CASE
    WHEN `Notes` REGEXP 'Ship (.)* to\:((\n)(.)+){1,5}'
    THEN MID(`Notes`, (LOCATE(' to:\n', `Notes`) + 5), (LOCATE('\n\nEmail:', `Notes`) - LOCATE(' to:\n', `Notes`) - 5))
END as 'ShipAddress',

Of course this requires knowing the exact layout of each note being parsed. Luckily I wrote that code to and I do know the exact layout saved to the DB every time. :)

Upvotes: 0

Alex Filipovici
Alex Filipovici

Reputation: 32561

The following pattern should work:

Ship (.)+ to\:[\s\S]*?(?=\nEmail)

Here's explained what it does.

For the MySQL part, maybe you could use the lib_mysqludf_preg UDF which is also referenced here: https://stackoverflow.com/a/2742764/674700.

Upvotes: 1

Related Questions