Reputation: 16229
Given this raw email:
[('96 (RFC822 {17888}',
'Delivered-To: [email protected]\r\nReceived: by 10.182.129.229 with SMTP id nz5csp2388417obb;\r\n Tue, 13 Oct 2015 14:57:14 -0700 (PDT)\r\nX-Received: by 10.68.136.103 with SMTP id pz7mr5507255pbb.114.1444773434163;\r\n Tue, 13 Oct 2015 14:57:14 -0700 (PDT)\r\nReturn-Path: <t0721aa7a92-ed37dd57c-9df2edd3ab1d4c49a5c9ac3a0569baab@bounce.twitter.com>\r\nReceived: from spruce-goose-bc.twitter.com (spruce-goose-bc.twitter.com. [199.59.150.98])\r\n by mx.google.com with ESMTPS id xm2si7949727pbb.66.2015.10.13.14.57.13\r\n for <[email protected]>\r\n (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);\r\n Tue, 13 Oct 2015 14:57:14 -0700 (PDT)\r\nReceived-SPF: pass (google.com: domain of t0721aa7a92-ed37dd57c-9df2edd3ab1d4c49a5c9ac3a0569baab@bounce.twitter.com designates 199.59.150.98 as permitted sender) client-ip=199.59.150.98;\r\nAuthentication-Results: mx.google.com;\r\n spf=pass (google.com: domain of t0721aa7a92-ed37dd57c-9df2edd3ab1d4c49a5c9ac3a0569baab@bounce.twitter.com designates 199.59.150.98 as permitted sender) smtp.mailfrom=t0721aa7a92-ed37dd57c-9df2edd3ab1d4c49a5c9ac3a0569baab@bounce.twitter.com;\r\n dkim=pass [email protected];\r\n dmarc=pass (p=REJECT dis=NONE) header.from=twitter.com\r\nDKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=twitter.com;\r\n\ts=dkim-201406; t=1444773433;\r\n\tbh=WBJ/04fcxapn9W2moQ6bGL1p7salO/SDhe2f3COz1us=;\r\n\th=Date:From:To:Subject:MIME-Version:Content-Type:Message-ID;\r\n\tb=tvyrM/Sz+g0WemkLWTYoarsftOM0Y4jQAWCNdqRm6W+5kBG43CP2q6woxrtDqgYHg\r\n\t o/zPvMa5nIPjoOfslv0YCUlhfuVjr0V/6InNMl65s3/zGRMlCQxQjS+UGsQrF2zH6Z\r\n\t G7pWHMTml1NxI2r77nuOhSyhknNFCA9pl0SkeNfoyK8jcIo6rNS2uugFBw5Ta/fS8i\r\n\t RMXcNpLA35k4Znvboe2aiZQg7ZY6NjbtNT3X6Ln4xuAgLkjeS/BfDBvd6M8CZ8yIT8\r\n\t 7xStI8xTfT/zKqcK+35yqnAqQ3QD5oll/DWxQatFUIYzLsgw2DV39XRo11y6OTdDim\r\n\t KNS2DTEjaOsBg==\r\nX-MSFBL: eyJ1IjoiaW5nbGVzbWFuYWd1YUBnbWFpbC5jb21AMTRAMzgxNjkwOTc5M0AwQDJj\r\n\tMjQ4NDVjZTJjOGMyNjI0NDMxY2MzZDBlOGY3NTZhNDVjNGI4MzQiLCJnIjoiRXZl\r\n\tcnl0aGluZyIsImIiOiJzbWYxLWJkcC0yMy1zcjEtRXZlcnl0aGluZy4xOTgiLCJy\r\n\tIjoiaW5nbGVzbWFuYWd1YUBnbWFpbC5jb20ifQ==\r\nDate: Tue, 13 Oct 2015 21:57:13 +0000\r\nFrom: Twitter <[email protected]>\r\nTo: example <[email protected]>\r\nSubject: Confirm your Twitter account, example\r\nMIME-Version: 1.0\r\nContent-Type: multipart/alternative; \r\n\tboundary="----=_Part_44683898_1221426234.1444773433942"\r\nFeedback-ID: 16481b2a2bd9895bc6fbf92980687bb5fdd96d63782c26cd:16481b2a2bd9895bc6fbf92980687bb5fdd96d63782c26cd:none:twitterESP\r\nMessage-ID: <[email protected]>\r\n\r\n------=_Part_44683898_1221426234.1444773433942\r\nContent-Type: text/plain; charset=UTF-8\r\nContent-Transfer-Encoding: 7bit\r\n\r\nexample,\r\n\r\nConfirm your email address to complete your Twitter account. It\'s easy - just click on the button below.\r\n\r\nClick on the link below or copy and paste it into a browser:\r\n\r\nhttps://twitter.com/i/redirect?url=https%3A%2F%2Ftwitter.com%2Faccount%2Fconfirm_user_email%2F3816909793%2F9CE5D-H4F5D-144477%3Ft%3D1%26cn%3DZW1haWxfY2hhbmdlX25vdGljZV9uZXh0%26sig%3Da6878f323b83b61ceb5eaa8fbdb2214d25fc65e7%26al%3D1%26iid%3D9df2edd3ab1d4c49a5c9ac3a0569baab%26ac%3D1%26autoactions%3D1444773433%26uid%3D3816909793%26nid%3D14%2B309&t=1&cn=ZW1haWxfY2hhbmdlX25vdGljZV9uZXh0&sig=2b56e3a59dd6b182afaf3a0030a96b26ccc67d73&iid=9df2edd3ab1d4c49a5c9ac3a0569baab&uid=3816909793&nid=14+309\r\n------=_Part_44683898_1221426234.1444773433942\r\nContent-Type: text/html; charset=UTF-8\r\nContent-Transfer-Encoding: quoted-printable\r\n\r\n<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/htm=\r\nl4/strict.dtd">\r\n<html>\r\n<head>\r\n<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dutf-8" />\r\n<meta name=3D"viewport" content=3D"width=3Ddevice-width, minimum-scale=3D1.=\r\n0, maximum-scale=3D1.0, user-scalable=3D0" />\r\n<meta name=3D"apple-mobile-web-app-capable" content=3D"yes" />\r\n<style type=3D"text/css">\r\n\r\n@media only screen and (max-device-width: 420px) {\r\ntd[class=3D"spacer"]{\r\nfont-size:4px !important;\r\n\r\n}\r\n\r\nspan[class=3D"address"] a {\r\n\r\nline-height:18px !important;\r\n}\r\n\r\n\r\ntd[class=3D"margins"]{\r\nwidth:18px !important;\r\n}\r\ntd[class=3D"logo_space"]{\r\nheight:12px !important;\r\n}\r\n}\r\n\r\n@media only screen and (max-device-width: 480px) {\r\n\r\ntable[class=3D"collapse"]{\r\nwidth:100% !important;\r\n}\r\n\r\ndiv[class=3D"collapse"]{\r\nwidth:100% !important;\r\n}\r\n\r\n\r\ntd[class=3D"body_text"] {\r\nfont-size:14px !important;\r\nline-height:22px !important;\r\n\r\n\r\n}\r\n\r\ntd[class=3D"greeting"]{\r\nfont-size:14px !important;\r\n\r\n}\r\n\r\n\r\ntd[class=3D"v_space"]{\r\nheight:8px !important;\r\n\r\n}\r\n\r\n\r\nspan[class=3D"address"]{\r\ndisplay:block !important;\r\nwidth:240px !important;\r\n}\r\ntd[class=3D"cut"]{\r\ndisplay:none !important;\r\n}\r\n\r\n}\r\n</style>\r\n</head>\r\n<body bgcolor=3D"#e1e8ed" style=3D"margin:0;padding:0;-webkit-text-size-adj=\r\nust:100%;-ms-text-size-adjust:100%;">\r\n<table cellpadding=3D"0" cellspacing=3D"0" border=3D"0" width=3D"100%" bgco=\r\nlor=3D"#e1e8ed" style=3D"background-color:#e1e8ed;padding:0;margin:0;line-h=\r\neight:1px;font-size:1px;" class=3D"body_wrapper">\r\n<tbody>\r\n<tr>\r\n<td align=3D"center" style=3D"padding:0;margin:0;line-height:1px;font-size:=\r\n1px;">\r\n<table class=3D"collapse" id=3D"header" align=3D"center" width=3D"500" styl=\r\ne=3D"width: 500px;padding:0;margin:0;line-height:1px;font-size:1px;" bgcolo=\r\nr=3D"#ffffff" cellpadding=3D"0" cellspacing=3D"0" border=3D"0">\r\n<tbody>\r\n<tr>\r\n<td style=3D"min-width: 500px;height:1px;padding:0;margin:0;line-height:1px=\r\n;font-size:1px;" class=3D"cut"> <img src=3D"https://ea.twimg.com/email/self=\r\n_serve/media/spacer-1402696023930.png" style=3D"min-width: 500px;height:1px=\r\n;margin:0;padding:0;display:block;-ms-interpolation-mode:bicubic;border:non=\r\ne;outline:none;" /> </td>\r\n</tr>\r\n</tbody>\r\n</table> </td>\r\n</tr>\r\n<tr>\r\n<td align=3D"center" style=3D"padding:0;margin:0;line-height:1px;font-size:=\r\n1px;">\r\n<!--///////////////////header///////////////////////////-->\r\n<table class=3D"collapse" id=3D"header" align=3D"center" width=3D"500" styl=\r\ne=3D"width:500px;background-color:#ffffff;padding:0;margin:0;line-height:1p=\r\nx;font-size:1px;" bgcolor=3D"#ffffff" cellpadding=3D"0" cellspacing=3D"0" b=\r\norder=3D"0">\r\n<tbody>\r\n<tr>\r\n<td height=3D"15" style=3D"height:15px;padding:0;margin:0;line-height:1px;f=\r\nont-size:1px;" class=3D"logo_space"> </td>\r\n</tr>\r\n<tr>\r\n<td style=3D"padding:0;margin:0;line-height:1px;font-size:1px;">\r\n<table cellpadding=3D"0" cellspacing=3D"0" border=3D"0" width=3D"100%" styl=\r\ne=3D"width:100%;padding:0;margin:0;line-height:1px;font-size:1px;">\r\n<tbody>\r\n<tr>\r\n<td align=3D"left" width=3D"15" style=3D"width:15px;padding:0;margin:0;line=\r\n-height:1px;font-size:1px;"></td>\r\n<td align=3D"left" width=3D"28" style=3D"padding:0;margin:0;line-height:1px=\r\n;font-size:1px;"> <a href=3D"https://twitter.com/i/redirect?url=3Dhttps%3A%=\r\n2F%2Ftwitter.com%3Fcn%3DZW1haWxfY2hhbmdlX25vdGljZV9uZXh0%26refsrc%3Demail&a=\r\nmp;t=3D1&cn=3DZW1haWxfY2hhbmdlX25vdGljZV9uZXh0&sig=3Dfe1cdb1344cee3=\r\nb9db0674bd2ce2f22397f739d7&iid=3D9df2edd3ab1d4c49a5c9ac3a0569baab&u=\r\nid=3D3816909793&nid=3D14+21" style=3D"text-decoration:none;border-style=\r\n:none;border:0;padding:0;margin:0;"><img align=3D"left" width=3D"28" src=3D=\r\n"https://ea.twimg.com/email/self_serve/media/logo-1400528502322.png" style=\r\n=3D"width:28px;padding-bottom:2px;margin:0;padding:0;display:block;-ms-inte=\r\nrpolation-mode:bicubic;border:none;outline:none;" /></a> </td>\r\n<td align=3D"left" width=3D"10" style=3D"width:10px;padding:0;margin:0;line=\r\n-height:1px;font-size:1px;"></td>\r\n<td align=3D"left" class=3D"greeting" style=3D"padding:0;margin:0;line-heig=\r\nht:1px;font-size:1px;font-family:\'Helvetica Neue Light\', Helvetica, Arial, =\r\nsans-serif;-webkit-font-smoothing:antialiased;-webkit-text-size-adjust:none=\r\n;color:#66757f;font-size:16px;padding:0px;margin:0px;font-weight:300;line-h=\r\neight:100%;text-align:left;"> example, </td>\r\n</tr>\r\n</tbody>\r\n</table> </td>\r\n</tr>\r\n<tr>\r\n<td height=3D"14" style=3D"height:14px;padding:0;margin:0;line-height:1px;f=\r\nont-size:1px;" class=3D"logo_space"> </td>\r\n</tr>\r\n</tbody>\r\n</table>\r\n<!--////////////////////border//////////////////////////-->\r\n<table class=3D"collapse" align=3D"center" width=3D"500" style=3D"width:500=\r\npx;background-color:#ffffff;padding:0;margin:0;line-height:1px;font-size:1p=\r\nx;" cellpadding=3D"0" cellspacing=3D"0" border=3D"0">\r\n<tbody>\r\n<tr id=3D"border">\r\n<td colspan=3D"2" height=3D"1" style=3D"line-height:1px;display:block;heigh=\r\nt:1px;background-color:#e1e8ed;padding:0;margin:0;line-height:1px;font-size=\r\n:1px;"></td>\r\n</tr>\r\n</tbody>\r\n</table>\r\n<!--//////////////////////////////////////////////-->\r\n<table class=3D"collapse" align=3D"center" width=3D"500" style=3D"width:500=\r\npx;background-color:#ffffff;padding:0;margin:0;line-height:1px;font-size:1p=\r\nx;" cellpadding=3D"0" cellspacing=3D"0" border=3D"0">\r\n<tbody>\r\n<tr>\r\n<td width=3D"50" style=3D"width:50px;padding:0;margin:0;line-height:1px;fon=\r\nt-size:1px;" class=3D"margins"></td>\r\n<td align=3D"center" style=3D"padding:0;margin:0;line-height:1px;font-size:=\r\n1px;">\r\n<table width=3D"100%" align=3D"center" cellpadding=3D"0" cellspacing=3D"0" =\r\nborder=3D"0" class=3D"collapse" style=3D"padding:0;margin:0;line-height:1px=\r\n;font-size:1px;">\r\n<tbody>\r\n<tr>\r\n<td height=3D"30" style=3D"height:30px;padding:0;margin:0;line-height:1px;f=\r\nont-size:1px;"></td>\r\n</tr>\r\n<tr>\r\n<td align=3D"left" style=3D"padding:0;margin:0;line-height:1px;font-size:1p=\r\nx;"> <span class=3D"headline_1" style=3D"font-family:\'Helvetica Neue Light\'=\r\n, Helvetica, Arial, sans-serif;-webkit-font-smoothing:antialiased;-webkit-t=\r\next-size-adjust:none;color:#66757f;font-size:28px;padding:0px;margin:0px;fo=\r\nnt-weight:300;line-height:100%;text-align:left;">Final step...</span> </td>\r\n</tr>\r\n<tr>\r\n<td height=3D"12" style=3D"height:12px;padding:0;margin:0;line-height:1px;f=\r\nont-size:1px;" class=3D"v_space"></td>\r\n</tr>\r\n<tr>\r\n<td align=3D"left" class=3D"body_text" style=3D"padding:0;margin:0;line-hei=\r\nght:1px;font-size:1px;font-family:\'Helvetica Neue Light\', Helvetica, Arial,=\r\n sans-serif;-webkit-font-smoothing:antialiased;-webkit-text-size-adjust:non=\r\ne;color:#66757f;font-size:16px;padding:0px;margin:0px;font-weight:300;line-=\r\nheight:23px;text-align:left;"> Confirm your email address to complete your =\r\nTwitter account. It\'s easy =E2=80=94 just click on the button below. </td>\r\n</tr>\r\n<!--*********** button ************-->\r\n<tr>\r\n<td height=3D"22" style=3D"height:22px;padding:0;margin:0;line-height:1px;f=\r\nont-size:1px;"></td>\r\n</tr>\r\n<tr>\r\n<td align=3D"left" class=3D"button" style=3D"padding:0;margin:0;line-height=\r\n:1px;font-size:1px;">\r\n<table bgcolor=3D"#55acee" height=3D"40" border=3D"0" cellspacing=3D"0" cel=\r\nlpadding=3D"0" align=3D"left" style=3D"white-space:nowrap;border-radius:5px=\r\n;border-style:none;text-align:center;padding:0;margin:0;line-height:1px;fon=\r\nt-size:1px;">\r\n<tbody>\r\n<tr>\r\n<td class=3D"spacer" width=3D"30" style=3D"font-size:1px;font-size:1px;line=\r\n-height:1px;font-size:1px;padding:0;margin:0;line-height:1px;font-size:1px;=\r\n"> </td>\r\n<td height=3D"40" align=3D"center" style=3D"padding:0;margin:0;line-height:=\r\n1px;font-size:1px;"> <a href=3D"https://twitter.com/i/redirect?url=3Dhttps%=\r\n3A%2F%2Ftwitter.com%2Faccount%2Fconfirm_user_email%2F3816909793%2F9CE5D-H4F=\r\n5D-144477%3Ft%3D1%26cn%3DZW1haWxfY2hhbmdlX25vdGljZV9uZXh0%26sig%3D69386bec1=\r\n102903b8e56a388d035a97f9d8e69f9%26al%3D1%26iid%3D9df2edd3ab1d4c49a5c9ac3a05=\r\n69baab%26ac%3D1%26autoactions%3D1444773433%26uid%3D3816909793%26nid%3D14%2B=\r\n308&t=3D1&cn=3DZW1haWxfY2hhbmdlX25vdGljZV9uZXh0&sig=3D256cbf355=\r\n6df8db1580c37c1e032d1178f4d23a3&iid=3D9df2edd3ab1d4c49a5c9ac3a0569baab&=\r\namp;uid=3D3816909793&nid=3D14+308" style=3D"border-style:none;text-deco=\r\nration:none;color:#ffffff;-webkit-font-smoothing: antialiased;font-size:14p=\r\nx;letter-spacing:0.02em;font-weight:bold;white-space:nowrap;overflow:hidden=\r\n;padding:0px;margin:0px;font-family:\'Helvetica Neue\', Helvetica, Arial, san=\r\ns-serif;line-height:14px;text-decoration:none;border-style:none;border:0;pa=\r\ndding:0;margin:0;"> <span class=3D"" style=3D"border-style:none;text-decora=\r\ntion:none;color:#ffffff;line-height:100%">Confirm now</span> </a> </td>\r\n<td class=3D"spacer" width=3D"30" style=3D"font-size:1px;font-size:1px;line=\r\n-height:1px;font-size:1px;padding:0;margin:0;line-height:1px;font-size:1px;=\r\n"> </td>\r\n</tr>\r\n</tbody>\r\n</table> </td>\r\n</tr>\r\n<!--*********** end button ************-->\r\n<tr>\r\n<td height=3D"44" style=3D"height:44px;padding:0;margin:0;line-height:1px;f=\r\nont-size:1px;"></td>\r\n</tr>\r\n</tbody>\r\n</table> </td>\r\n<td width=3D"50" style=3D"width:50px;padding:0;margin:0;line-height:1px;fon=\r\nt-size:1px;" class=3D"margins"></td>\r\n</tr>\r\n</tbody>\r\n</table>\r\n<!--//////////////////////////////////////////////-->\r\n<table class=3D"collapse" id=3D"footer" align=3D"center" width=3D"500" styl=\r\ne=3D"width:500px;background-color:#ffffff;padding:0;margin:0;line-height:1p=\r\nx;font-size:1px;" cellpadding=3D"0" cellspacing=3D"0" border=3D"0">\r\n<tbody>\r\n<tr>\r\n<td height=3D"1" style=3D"line-height:1px;display:block;height:1px;backgrou=\r\nnd-color:#e1e8ed;padding:0;margin:0;line-height:1px;font-size:1px;"></td>\r\n</tr>\r\n<tr>\r\n<td height=3D"20" style=3D"height:20;padding:0;margin:0;line-height:1px;fon=\r\nt-size:1px;"></td>\r\n</tr>\r\n<tr>\r\n<td align=3D"center" style=3D"padding:0;margin:0;line-height:1px;font-size:=\r\n1px;"> <span class=3D"footer_type" style=3D"font-family:\'Helvetica Neue Lig=\r\nht\', Helvetica, Arial, sans-serif;-webkit-font-smoothing:antialiased;color:=\r\n#8899a6;font-size:12px;padding:0px;margin:0px;font-weight:normal;line-heigh=\r\nt:12px;"> <a href=3D"https://twitter.com/i/redirect?url=3Dhttps%3A%2F%2Ftwi=\r\ntter.com%2Fi%2Fredirect%3Furl%3Dhttps%253A%252F%252Ftwitter.com%252Fsetting=\r\ns%252Fnotifications%253Fcn%253DZW1haWxfY2hhbmdlX25vdGljZV9uZXh0%26t%3D1%26c=\r\nn%3DZW1haWxfY2hhbmdlX25vdGljZV9uZXh0%26sig%3D3084a7eb53ea988c00b18e060fa6a6=\r\n023b0f5c36%26iid%3D9df2edd3ab1d4c49a5c9ac3a0569baab%26uid%3D3816909793%26ni=\r\nd%3D14%2B27&t=3D1&cn=3DZW1haWxfY2hhbmdlX25vdGljZV9uZXh0&sig=3Da=\r\n53a86b7487b15c908170e0d06203350ad2e0745&iid=3D9df2edd3ab1d4c49a5c9ac3a0=\r\n569baab&uid=3D3816909793&nid=3D14+1555" class=3D"footer_link" style=\r\n=3D"text-decoration:none;border-style:none;border:0;padding:0;margin:0;font=\r\n-family:\'Helvetica Neue Light\', Helvetica, Arial, sans-serif;-webkit-font-s=\r\nmoothing:antialiased;-webkit-text-size-adjust:none;color:#55acee;font-size:=\r\n12px;padding:0px;margin:0px;font-weight:600;line-height:12px;">Settings</a>=\r\n | <a href=3D"https://twitter.com/i/redirect?url=3Dhttps%3A%2F%2Fsupport.tw=\r\nitter.com%2F&t=3D1&cn=3DZW1haWxfY2hhbmdlX25vdGljZV9uZXh0&sig=3D=\r\n1dfdf7cecb06258c7e6a41ca318ec4370f621673&iid=3D9df2edd3ab1d4c49a5c9ac3a=\r\n0569baab&uid=3D3816909793&nid=3D14+1557" class=3D"footer_link" styl=\r\ne=3D"text-decoration:none;border-style:none;border:0;padding:0;margin:0;fon=\r\nt-family:\'Helvetica Neue Light\', Helvetica, Arial, sans-serif;-webkit-font-=\r\nsmoothing:antialiased;-webkit-text-size-adjust:none;color:#55acee;font-size=\r\n:12px;padding:0px;margin:0px;font-weight:600;line-height:12px;">Help</a> | =\r\n<a href=3D"https://twitter.com/i/u?t=3D1&cn=3DZW1haWxfY2hhbmdlX25vdGljZ=\r\nV9uZXh0&sig=3D638d06973cb368d673778db5c8414b594d5c6ed2&iid=3D9df2ed=\r\nd3ab1d4c49a5c9ac3a0569baab&uid=3D3816909793&nid=3D14+26" class=3D"f=\r\nooter_link" style=3D"text-decoration:none;border-style:none;border:0;paddin=\r\ng:0;margin:0;font-family:\'Helvetica Neue Light\', Helvetica, Arial, sans-ser=\r\nif;-webkit-font-smoothing:antialiased;-webkit-text-size-adjust:none;color:#=\r\n55acee;font-size:12px;padding:0px;margin:0px;font-weight:600;line-height:12=\r\npx;">Opt-out</a> | <a href=3D"https://twitter.com/i/redirect?url=3Dhttps%3A=\r\n%2F%2Ftwitter.com%2Faccount%2Fnot_my_account%2F3816909793%2F9CE5D-H4F5D-144=\r\n477%3Fut%3D1%26cn%3DZW1haWxfY2hhbmdlX25vdGljZV9uZXh0&t=3D1&cn=3DZW1=\r\nhaWxfY2hhbmdlX25vdGljZV9uZXh0&sig=3D0e2b07faf8b7cab119459e512ea58097f5b=\r\n8e82b&iid=3D9df2edd3ab1d4c49a5c9ac3a0569baab&uid=3D3816909793&n=\r\nid=3D14+25" class=3D"footer_link" style=3D"text-decoration:none;border-styl=\r\ne:none;border:0;padding:0;margin:0;font-family:\'Helvetica Neue Light\', Helv=\r\netica, Arial, sans-serif;-webkit-font-smoothing:antialiased;-webkit-text-si=\r\nze-adjust:none;color:#55acee;font-size:12px;padding:0px;margin:0px;font-wei=\r\nght:600;line-height:12px;">Not my account</a> </span> </td>\r\n</tr>\r\n<tr>\r\n<td height=3D"10" style=3D"height:10px;line-height:1px;font-size:1px;paddin=\r\ng:0;margin:0;line-height:1px;font-size:1px;"></td>\r\n</tr>\r\n<tr>\r\n<td align=3D"center" style=3D"padding:0;margin:0;line-height:1px;font-size:=\r\n1px;"> <span class=3D"address"> <a href=3D"" style=3D"text-decoration:none;=\r\nborder-style:none;border:0;padding:0;margin:0;font-family:\'Helvetica Neue L=\r\night\', Helvetica, Arial, sans-serif;-webkit-font-smoothing:antialiased;colo=\r\nr:#8899a6;font-size:12px;padding:0px;margin:0px;font-weight:normal;line-hei=\r\nght:12px;cursor:default;">Twitter, Inc. 1355 Market Street, Suite 900 San F=\r\nrancisco, CA 94103</a> </span> </td>\r\n</tr>\r\n<tr>\r\n<td height=3D"26" style=3D"height:26;padding:0;margin:0;line-height:1px;fon=\r\nt-size:1px;"></td>\r\n</tr>\r\n</tbody>\r\n</table> <img width=3D"1" height=3D"1" style=3D"display: block;margin:0;pad=\r\nding:0;display:block;-ms-interpolation-mode:bicubic;border:none;outline:non=\r\ne;" src=3D"https://twitter.com/scribe/ibis?t=3D1&cn=3DZW1haWxfY2hhbmdlX=\r\n25vdGljZV9uZXh0&iid=3D9df2edd3ab1d4c49a5c9ac3a0569baab&uid=3D381690=\r\n9793&nid=3D14+20" />\r\n<!--//////////////////////////////////////////////--> </td>\r\n</tr>\r\n</tbody>\r\n</table>\r\n</body>\r\n</html>\r\n\r\n------=_Part_44683898_1221426234.1444773433942--\r\n')
I'm trying to extract the confirmation email that must be clicked:
https://twitter.com/i/redirect?url=https%3A%2F%2Ftwitter.com%2Faccount%2Fconfirm_user_email%2F3816909793%2F9CE5D-H4F5D-144477%3Ft%3D1%26cn%3DZW1haWxfY2hhbmdlX25vdGljZV9uZXh0%26sig%3Da6878f323b83b61ceb5eaa8fbdb2214d25fc65ahdgdga33%3D9df2edd3ab1d4c49a5c9ac3a0569baab%26ac%3D1%26autoactions%3D1444773433%26uid%3D3816909793%26nid%3D14%2B309&t=1&cn=ZW1haWxfY2hhbmdlX25vdGljZV9uZXh0&sig=2b56e3a59dd6b182afaf3abxcc67d73&iid=9df2edd3ab1d4c49a5c9ac3a0569baab&uid=3816909793&nid=14+309
Using regex101, I build this regex, and it seems to be working well. Yet when I extract the generated Python code:
import re
p = re.compile(ur'(https.+)(\\r|\\n)')
test_str = (the full email text)
then re.search(p, test_str)
returns nothing. As does re.findall()
.
Why would the generated Python code not work, and/or is there a better regex? Note: there are several Twitter URLs in the text; I wish to only match the one tied to the 'Confirm Now' button.
Python: 2.7
Upvotes: 1
Views: 3186
Reputation: 56819
Before you use regex, or other more appropriate tools, to extract data from the email, you should first process the email properly with an email parser. In Python, we have email.parser available out of the box:
raw_content = 'Delivered-To: [email protected]...'
import email.parser
email_parser = email.parser.Parser()
email_content = email_parser.parsestr(raw_content)
def get_all_messages(email_message):
stack = [email_message]
messages = []
while len(stack):
msg = stack.pop()
if msg.is_multipart():
stack += msg.get_payload()
else:
messages.append(msg)
return messages
messages = get_all_messages(email_content)
The messages
variable contains the individual parts in the email. You can choose to use regex to extract the link from the text/plain
message, or use HTML parser like BeautifulSoup to extract the link from text/html
message.
Below is example code for extracting the link from text/plain
message:
for msg in messages:
if msg.get_content_type() == 'text/plain':
import re
# Decode the message according to Content-Transfer-Encoding
# Then decode the text according to charset field in Content-Type header, fall back to UTF-8 if not specified
payload = msg.get_payload(decode=True).decode(msg.get_content_charset('utf-8'))
link = re.findall(ur'https?://.*', payload)
Take note of the call .get_payload(decode=True)
. The decode
parameter must be specified to decode the payload according to Content-Transfer-Encoding header. While it doesn't matter in the case of text/plain
message, it affects the correctness for the text/html
, since the payload in that case is quoted-printable
.
Since there is only a single link, the simple regex above suffices.
You can use similar code to process the payload of text/html
message before parsing it with a HTML parser. After the HTML is parsed, you can select all <a>
tags, and only retain those that contains confirm_user_email
in the link.
Upvotes: 1
Reputation: 98961
result = re.findall(r"(https.*?)(?:\r|\n)", email, re.MULTILINE)
link = result[0]
Live Python Demo
Regex Explanation
(https.*?)(?:\r|\n)
Match the regex below and capture its match into backreference number 1 «(https.*?)»
Match the character string “https” literally «https»
Match any single character that is NOT a line break character «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the regular expression below «(?:\r|\n)»
Match this alternative «\r»
Match the carriage return character «\r»
Or match this alternative «\n»
Match the line feed character «\n»
Upvotes: 2
Reputation: 52049
I would use a slightly different regex:
import re
with open('out') as f: # out contains the page content
content = f.read()
p = re.compile(u'"(https:.*?)"')
for m in re.findall(p, content):
print m
The .*?
is a non-greedy match and will stop at the first double quote.
Upvotes: 1
Reputation: 458
Try removing the "ur" from the beginning of your regex expression. Also you can directly use the compiled regex as the object to perform the search with.
Try this:
import re
p = re.compile('(https.+)(\\r|\\n)')
test_str = (the full email text)
desired_string = p.search(test_str)
print desired_string.group(0)
Upvotes: 1
Reputation: 9609
If you're using string literals then don't try to escape the \
character. So, remove the r
at the beginning:
p = re.compile(u'(https.+)(\\r|\\n)')
Or don't use double backslahes:
p = re.compile(ur'(https.+)(\r|\n)')
Hope it helps!
Upvotes: 1