Frank
Frank

Reputation: 185

How would I parse this?

I have an email that looks like this:

We’ve received a request to change your email address to [email protected].

To complete the process, please verify your email address by entering the following verification code.

86761G

This code is temporary and will expire in 30 minutes.

If this wasn’t requested by you, your account information will remain unchanged. No further action is required.

Warm regards, Example.com

I need to parse out the verification code: 86761G . Catch being that the code is dynamic, meaning it's ever changing. What IS static though is the layout of the email, so my thought would be to grab it by the new line index [2] (Even though it looks there's spaces in between it's the third <p> tag in the Div therefor the [2] index via new lines). Or my other idea was to do it via the HTML somehow (Don't really wanna use HTMLAgilityPack). The HTML is as follows for the Div:

<td colspan="2" style="padding:1.2em 45px 2em 45px;color:#000;font-   family:Corbel, 'Trebuchet MS', 'Helvetica Neue', Helvetica, Arial, sans-serif;font-size:.875em;line-height:1.1em;">
<p>We’ve received a request to change your email address to [email protected].</p>
<p>To complete the process, please verify your email address by entering the following verification code.</p>
<p>86761G</p>
<p>This code is temporary and will expire in 30 minutes.</p>
<p>If this wasn’t requested by you, your account information will remain unchanged. No further action is required.</p>


<p>Warm regards,<br>
example.com</p>
</td>

Any idea how to parse this data out? I was thinking Regex if possible, even though I know that Regex isn't meant for HTML because it's not regular text. If I need HTMLAgilityPack I'll use it, if not though I prefer not. Thank you guys!

Oh side note - I'm using Firefox via Selenium, so there's always the option to use it's built in functions to grab it somehow?

Edit: I'm so stupid. Selenium - FindElementbyXPath (facepalm)

Upvotes: 1

Views: 104

Answers (4)

Eduardo Wada
Eduardo Wada

Reputation: 2647

If you are using selenium, most likely the simplest way is to match it with the following css selector: p:nth-child(3)

Upvotes: 1

Eddy K
Eddy K

Reputation: 216

You can use the following regular expression if the email is exactly the same all the time accept changing code:

(?<d>\<p\>[\S^\.]*</p\>)

if it is more complex you can do this:

(?<d>\<p\>.*</p\>)

which will find all paragraph lines and you can then iterate and find the code by elimination of constant strings like:

To complete the process, please verify your email address by entering the following verification code.

Upvotes: 0

Dai
Dai

Reputation: 155145

Contrary to popular (and misinformed, imo) opinion, you can use Regular Expressions to extract this because the overarching structure of this document does, in fact, meet the requirements to be considered a Regular Grammar ( http://en.wikipedia.org/wiki/Chomsky_hierarchy )

Here's a regex I would use:

following verification code.</p>\s*<p>(\S+)</p>

Note the lack of any anchors (^$), it uses the known text "following verification code" to match just before the code. The verification code is then contained within the single regex group.

Upvotes: 1

Marko Gresak
Marko Gresak

Reputation: 8207

Since you've mentioned only the verification code part is dynamic, I'm assuming whole markup structure won't change.

If this is true, you could use

<p>(.*?)<\/p>

This will capture <p> tags, 3rd captured group is your verification code.

Upvotes: 0

Related Questions