Reputation: 195
I am trying to extract the To header from an email file using sed on linux.
The problem is that the To header could be on multiple lines.
e.g:
To: [email protected], [email protected],
[email protected], [email protected],
[email protected]
Message-ID: <[email protected]>
I tried the following:
sed -n -e '/^[Tt]o: / { N; p; }' _message_file_ |
awk '{$1=$1;printf("%s ",$0)};NR%2==0{print ""}'
The sed command extracts the line starting with To and next line. I pipe the output to awk to put everything on a single line.
The full command outputs in one line:
To: [email protected], [email protected], [email protected], [email protected]
I don't know how to keep going and test if the next line starts with whitespace and add it to the result.
What I want is all the addresses
To: [email protected], [email protected], [email protected], [email protected], [email protected]
Any help will be appreciated.
Upvotes: 2
Views: 2534
Reputation: 119
It could be as straightforward as this:
sed -n '/^To:/{
:a
p
n
/^[[:space:]]/ba
}'
Be silent, but starting from the To:
header print the text line by line while it still relevant to the header.
Upvotes: 0
Reputation: 14999
Both formail
and reformail
have a -c
option to do exactly that.
From man reformail
:
-c Concatenate multi-line headers. Headers split on multiple lines are combined into a single line.
So you don't need to pipe the output to awk, and can just do
reformail -c -X To: < $your_message_file
However, emails normally use CRLF line endings, and the output on screen may be garbled because of the CR characters. To remove them, you can use Perl's generic \R
line ending in a regex on the output :
reformail -c -X To: < $your_message_file | perl -pe 's/\R/\n/g'
or do it on the input if you prefer:
perl -pe 's/\R/\n/g' $your_message_file | reformail -c -X To:
On Debian and derived systems like Ubuntu, you can install them with
apt install maildrop
for reformail, which is part of Courier's maildrop
or apt install procmail
for formail
(but procmail seems to be abandoned now).
Upvotes: 3
Reputation: 16797
formail
is a good solution but here's how to do it with sed:
sed -e '/^$/q;/^To:/!d;n;:c;/^\s/!d;n;bc' message_file
/^$/q;
- (optional) quit if we run out of headers/^To:/!d;
- if not a To: header, stop processing this linen;
- otherwise, implicitly print it, and load next line:c;
- c is a label we can branch to/^\s/!d;
- if not a contination, stop processing this linen;
- otherwise, implicitly print it, and load next linebc
- branch back to label c (ie. loop)Upvotes: 4
Reputation: 195
I did it like this:
cat _message_file | formail -X To: | awk '{$1=$1;printf("%s ",$0)};NR%2==0{print ""}'
Or:
formail -X To: < _message_file | awk '{$1=$1;printf("%s ",$0)};NR%2==0{print ""}'
Upvotes: 2
Reputation: 58473
This might work for you (GNU sed):
sed -n '/^To:/{:a;N;/^ /Ms/\s*\n\s*/ /;ta;P}' file
Turn off implicit printing by using the -n
option. Gather up the lines starting with white space, removing white space either side of the newline and replace it by a single space, starting from the line that begins To:
. When matching fails, print the first line in the pattern space.
To print addresses as is, use:
sed '/^\S/h;G;/^To:/MP;d' file
Upvotes: 2