Reputation: 2669
I have a string which is like:
Return-Path: [email protected]
Received-SPF: pass (fake.link.com: Sender is authorized to use '[email protected]' in 'mfrom' identity (mechanism 'include:spf.smtp2go.com' matched)) receiver=pmxlab01.permission.email; identity=mailfrom; envelope-from="[email protected]"; helo=e2i353.smtp2go.com; client-ip=103.2.141.97
Received: from e2i353.smtp2go.com (e2i353.smtp2go.com [103.2.141.97])
by mailserver.fake.com(Proxmox) with ESMTP id A4F983E1048
for <[email protected]>; Tue, 24 Aug 2021 14:47:20 +0100 (BST)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
d=smtpcorp.com; s=a1-4; h=Feedback-ID:X-Smtpcorp-Track:Message-Id:Subject:
Date:To:From:Reply-To:Sender:List-Unsubscribe;
bh=cTg4MkkE2uaIjpApjJYQFK3RgYiMF3bwCj8UZjFO4NE=; b=STU7lctit7L5LJ2tA3Re1fe4II
lXJbY/SBXTGqCHh9p4K86aLK5Bvz98Q7eR9xwjFib6x4NoZZ5L1fke0XQERd1eQvxkl9R+kRIGU8A
QOtrLPpt8coN8P+syoaTRR4pDJQG9OfJO1fON9OaOP8HwnEg/91ie6Cm+wQRxjwyat859uAcu89Xv
6/mrcequkSp6kfiQN4goZ7vMYJYfBYuooslbTciaK4SYIfxdINyrrWGA6QhJPobdW0uuedRNY5jBG
OdMbVmm7FTpxDJs51rB1PTIcFQ8W1oypcttqSgCjI+5eMVrabU/IoIxhX5F0Cn3zm7E9CHlaJuLt1
CRXVbwdw==;
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=fake.com; [email protected]; q=dns/txt; s=s575655;
t=1629812840; h=from : subject : to : message-id : date;
bh=cTg4MkkE2uaIjpApjJYQFK3RgYiMF3bwCj8UZjFO4NE=;
b=TEeEsPNLf7Wi6b8aaxE6JvfymfBKYjLq7izcUVrOXTW7sGIznxOA5udhfmDh15Fgp6Qgh
Kv5HX9uPNa8TEeoaJ+gV/4KERuscnc4GXEHwo0eclktx6f6JI5h1/q+qCe34+cN/EweaP5n
iOs+nrzsRuWn/iQ0Yck+b4IXVWHoTW8298xmBNuC1JF4jIVXREJFAC0nACfGU03OlpjDXf/
qvI6Ffnn5YGTNxgIkOdrtymaqOvjG9NM0PWtgSkvsTCJdUvxkrI+rRUG6ixiNi+vifqwvox
aQ6BRnMmeNK7A954Dy9r9r09QzbTthsBsi+lORKH7DntBKhm7Rb5/Q9j0xVA==
Received: from [10.176.58.103] (helo=SmtpCorp) by smtpcorp.com with esmtpsa
(TLS1.2:ECDHE_SECP256R1__RSA_SHA256__AES_256_GCM:256)
(Exim 4.94.2-S2G) (envelope-from <[email protected]>)
id 1mIWls-TRjyEC-AK for [email protected]; Tue, 24 Aug 2021 13:47:20 +0000
Received: from [10.86.20.232] (helo=DESKTOP-69OG2R3)
by smtpcorp.com with esmtpsa (TLS1.2:ECDHE_RSA_SECP256R1__AES_256_GCM:256)
(Exim 4.94.2-S2G) (envelope-from <[email protected]>)
id 1mIWlr-9EFPsz-U0 for [email protected]; Tue, 24 Aug 2021 13:47:19 +0000
MIME-Version: 1.0
From: [email protected]
To: [email protected]
Date: 24 Aug 2021 14:46:30 +0100
Subject: Test Email 2xM9e5Dj
Content-Type: multipart/alternative;
boundary=--boundary_11_ddba370a-13e2-4ffc-8b36-0eb7a5cde80e
Message-Id: <[email protected]>
X-Smtpcorp-Track: 1XmW_r9EFeszl0.JChXLDDjoy7xH
Feedback-ID: 575655m:575655aVI_MaS:575655sNpPp5WOdD
X-Report-Abuse: Please forward a copy of this message, including all headers,
to <[email protected]>
----boundary_11_ddba370a-13e2-4ffc-8b36-0eb7a5cde80e
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
This is a text message
----boundary_11_ddba370a-13e2-4ffc-8b36-0eb7a5cde80e
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
This is a html message
----boundary_11_ddba370a-13e2-4ffc-8b36-0eb7a5cde80e--
This is stored in a variable called $emailText
I'm trying to use a regex to take the From part out of the text
From: [email protected]
My regex isnt super strong, however my testing looks like this works: (?<=From: ).*.
But when I try and take the text out, it appears I can't get the regex to go through properly.
echo [[ $emailText =~ (?<=From: ).*. ]]
Upvotes: 8
Views: 207
Reputation: 133518
With your shown samples, attempts; please try following awk
code. Simple explanation would be, checking condition if 1st field is From: then print 2nd field of that line.
awk '$1=="From:"{print $2}' Input_file
2nd solution: In case you have only 1 entry of From:
in whole file then try following, where we can use exit
function to exit from Input_file after printing the matched line, to stop un-necessary reading of whole Input_file.
awk '$1=="From:"{print $2;exit}' Input_file
Upvotes: 3
Reputation: 88601
With bash
:
[[ "$emailText" =~ From:\ ([^$'\n']*) ]] && echo "${BASH_REMATCH[1]}"
Output:
[email protected]
Upvotes: 3
Reputation: 785146
bash
regex doesn't support lookbehind or lookahead assertions.
It is much easier to use a non-regex approach using awk here:
awk -F ': ' '$1 == "From" {print $2}' <<< "$emailText"
[email protected]
Upvotes: 5
Reputation: 163342
If there should be a mail address present, you can match it first using awk
(without the unsupported need for lookarounds)
awk 'match($0, /^From: [^[:space:]@]+@[^[:space:]@]+$/) {
print $2
}' <<< "$emailText"
Output
[email protected]
Upvotes: 2
Reputation: 189387
Assuming you only want the email terminus, here's a quick and dirty Awk script.
awk '/^$/ { exit 1 }
/^From: .* <[^<>@]+@[^<>]+>/ {
split($0, g, /[<>]/); print g[1]; exit }
/^From: / { print $2; exit }' file.eml
This should work correctly for all these cases:
From: Real Name <[email protected]>
From: "Name, Real" <[email protected]>
From: [email protected]
From: [email protected] (Real Name)
From: =?q?utf-8?Real_N=A3=E4me?= <[email protected]>
As especially the last example should convince you, you will need significantly more work if you also need the full name of the correspondent in normalized form.
Upvotes: 2