MissCoder87
MissCoder87

Reputation: 2669

Extracting string to variable using regex bash

I have a string which is like:

Return-Path: [email protected]
Received-SPF: pass (fake.link.com: Sender is authorized to use '[email protected]' in 'mfrom' identity (mechanism 'include:spf.smtp2go.com' matched)) receiver=pmxlab01.permission.email; identity=mailfrom; envelope-from="[email protected]"; helo=e2i353.smtp2go.com; client-ip=103.2.141.97
Received: from e2i353.smtp2go.com (e2i353.smtp2go.com [103.2.141.97])
    by mailserver.fake.com(Proxmox) with ESMTP id A4F983E1048
    for <[email protected]>; Tue, 24 Aug 2021 14:47:20 +0100 (BST)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
    d=smtpcorp.com; s=a1-4; h=Feedback-ID:X-Smtpcorp-Track:Message-Id:Subject:
    Date:To:From:Reply-To:Sender:List-Unsubscribe;
    bh=cTg4MkkE2uaIjpApjJYQFK3RgYiMF3bwCj8UZjFO4NE=; b=STU7lctit7L5LJ2tA3Re1fe4II
    lXJbY/SBXTGqCHh9p4K86aLK5Bvz98Q7eR9xwjFib6x4NoZZ5L1fke0XQERd1eQvxkl9R+kRIGU8A
    QOtrLPpt8coN8P+syoaTRR4pDJQG9OfJO1fON9OaOP8HwnEg/91ie6Cm+wQRxjwyat859uAcu89Xv
    6/mrcequkSp6kfiQN4goZ7vMYJYfBYuooslbTciaK4SYIfxdINyrrWGA6QhJPobdW0uuedRNY5jBG
    OdMbVmm7FTpxDJs51rB1PTIcFQ8W1oypcttqSgCjI+5eMVrabU/IoIxhX5F0Cn3zm7E9CHlaJuLt1
    CRXVbwdw==;
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=fake.com; [email protected]; q=dns/txt; s=s575655;
 t=1629812840; h=from : subject : to : message-id : date;
 bh=cTg4MkkE2uaIjpApjJYQFK3RgYiMF3bwCj8UZjFO4NE=;
 b=TEeEsPNLf7Wi6b8aaxE6JvfymfBKYjLq7izcUVrOXTW7sGIznxOA5udhfmDh15Fgp6Qgh
 Kv5HX9uPNa8TEeoaJ+gV/4KERuscnc4GXEHwo0eclktx6f6JI5h1/q+qCe34+cN/EweaP5n
 iOs+nrzsRuWn/iQ0Yck+b4IXVWHoTW8298xmBNuC1JF4jIVXREJFAC0nACfGU03OlpjDXf/
 qvI6Ffnn5YGTNxgIkOdrtymaqOvjG9NM0PWtgSkvsTCJdUvxkrI+rRUG6ixiNi+vifqwvox
 aQ6BRnMmeNK7A954Dy9r9r09QzbTthsBsi+lORKH7DntBKhm7Rb5/Q9j0xVA==
Received: from [10.176.58.103] (helo=SmtpCorp) by smtpcorp.com with esmtpsa
 (TLS1.2:ECDHE_SECP256R1__RSA_SHA256__AES_256_GCM:256)
 (Exim 4.94.2-S2G) (envelope-from <[email protected]>)
 id 1mIWls-TRjyEC-AK for [email protected]; Tue, 24 Aug 2021 13:47:20 +0000
Received: from [10.86.20.232] (helo=DESKTOP-69OG2R3)
 by smtpcorp.com with esmtpsa (TLS1.2:ECDHE_RSA_SECP256R1__AES_256_GCM:256)
 (Exim 4.94.2-S2G) (envelope-from <[email protected]>)
 id 1mIWlr-9EFPsz-U0 for [email protected]; Tue, 24 Aug 2021 13:47:19 +0000
MIME-Version: 1.0
From: [email protected]
To: [email protected]
Date: 24 Aug 2021 14:46:30 +0100
Subject: Test Email 2xM9e5Dj
Content-Type: multipart/alternative;
 boundary=--boundary_11_ddba370a-13e2-4ffc-8b36-0eb7a5cde80e
Message-Id: <[email protected]>
X-Smtpcorp-Track: 1XmW_r9EFeszl0.JChXLDDjoy7xH
Feedback-ID: 575655m:575655aVI_MaS:575655sNpPp5WOdD
X-Report-Abuse: Please forward a copy of this message, including all headers,
 to <[email protected]>


----boundary_11_ddba370a-13e2-4ffc-8b36-0eb7a5cde80e
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable

This is a text message
----boundary_11_ddba370a-13e2-4ffc-8b36-0eb7a5cde80e
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: quoted-printable

This is a html message
----boundary_11_ddba370a-13e2-4ffc-8b36-0eb7a5cde80e--


This is stored in a variable called $emailText

I'm trying to use a regex to take the From part out of the text

From: [email protected]

My regex isnt super strong, however my testing looks like this works: (?<=From: ).*.

But when I try and take the text out, it appears I can't get the regex to go through properly.

echo [[ $emailText =~ (?<=From: ).*. ]]

Upvotes: 8

Views: 207

Answers (5)

RavinderSingh13
RavinderSingh13

Reputation: 133518

With your shown samples, attempts; please try following awk code. Simple explanation would be, checking condition if 1st field is From: then print 2nd field of that line.

awk '$1=="From:"{print $2}' Input_file

2nd solution: In case you have only 1 entry of From: in whole file then try following, where we can use exit function to exit from Input_file after printing the matched line, to stop un-necessary reading of whole Input_file.

awk '$1=="From:"{print $2;exit}' Input_file

Upvotes: 3

Cyrus
Cyrus

Reputation: 88601

With bash:

[[ "$emailText" =~ From:\ ([^$'\n']*) ]] && echo "${BASH_REMATCH[1]}"

Output:

[email protected]

Upvotes: 3

anubhava
anubhava

Reputation: 785146

bash regex doesn't support lookbehind or lookahead assertions.

It is much easier to use a non-regex approach using awk here:

awk -F ': ' '$1 == "From" {print $2}' <<< "$emailText"

[email protected]

Upvotes: 5

The fourth bird
The fourth bird

Reputation: 163342

If there should be a mail address present, you can match it first using awk (without the unsupported need for lookarounds)

awk 'match($0, /^From: [^[:space:]@]+@[^[:space:]@]+$/) {
  print $2
}' <<< "$emailText"

Output

[email protected]

Upvotes: 2

tripleee
tripleee

Reputation: 189387

Assuming you only want the email terminus, here's a quick and dirty Awk script.

awk '/^$/ { exit 1 }
    /^From: .* <[^<>@]+@[^<>]+>/ {
        split($0, g, /[<>]/); print g[1]; exit }
    /^From: / { print $2; exit }' file.eml

This should work correctly for all these cases:

From: Real Name <[email protected]>
From: "Name, Real" <[email protected]>
From: [email protected]
From: [email protected] (Real Name)
From: =?q?utf-8?Real_N=A3=E4me?= <[email protected]>

As especially the last example should convince you, you will need significantly more work if you also need the full name of the correspondent in normalized form.

Upvotes: 2

Related Questions