Farhan Umer
Farhan Umer

Reputation: 53

Print first column and email with awk or sed

How can I use AWK in the following situation?

Example input:

17  [email protected]
9   Limited <[email protected]>
8  "Fishing Forum" <[email protected]>

Desired output:

17  [email protected]
9   [email protected]
8   [email protected]

I want print $1 with email addresses from each line.

Upvotes: 0

Views: 1530

Answers (4)

shellter
shellter

Reputation: 37298

If your data is really as simple as you show, you can use awk sub() function to get ride of what you don't want, i.e.

 awk '{
      # inside the implied awk process-all-lines-of-input-loop
      email=$0
      if (email ~ /<\.*>/) {
        sub(/^.*</,"", email)
        sub(/>.*$/,"", email)
      } 
      else { email=$2 }
      printf("%s\t%s\n", $1, email)
      }' mailFile > newMailFile

cat newMailFile
17      17  [email protected]
9       [email protected]
8       [email protected]

Note that we've copied the complete line ($0), into the variable email, and then removed all chars starting at the left, until the first < char, then removed anything at the end of the email variable starting with the closing > char. Note that email addresses can be rather complicated to parse for the corner cases, so it's possible that this technique may miss some cases, but given it's simplicity, it should be good enough.

Also, if you're not used to awk and shell programming, note that you can't overwrite your input file with the same output filename" DON'T attempt something like awk '....' file > file. It will essentially wipe out file.

The printf is a fancy way to print your data, the \t gives you a tab char in between the 2 fields. You could also do it more simply with print $1 "\t" email.

IHTH.

Upvotes: 1

Ali Okan Y&#252;ksel
Ali Okan Y&#252;ksel

Reputation: 388

You can use "sed" for that

$ ./test.sh | sed -r -e 's/<//g' -e 's/>//g' -e 's/^([0-9]+).* (.+)$/\1 \2/'
17 [email protected]
9 [email protected]
8 [email protected]

Upvotes: 0

Eran Ben-Natan
Eran Ben-Natan

Reputation: 2615

In order to deal with all possible Email options (see tripleee comment), you need to match an Email with regexp:

gawk --re-interval '{match($0,/[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z]{2,4}/);print $1 " " substr($0,RSTART,RLENGTH)}'

The regexp it taken from here: http://www.regular-expressions.info/email.html. You should test it to verify that it covers all legal Emails.

Upvotes: 2

Suku
Suku

Reputation: 3880

$ cat stack 
17  [email protected]
9   Limited <[email protected]>
8  "Fishing Forum" <[email protected]>

$ cat stack | awk '{ print $1" "$NF }' | sed 's/<//g; s/>//g'
17 [email protected]
9 [email protected]
8 [email protected]

If you want a tab between first column of output, use like following:

$ cat stack | awk '{ print $1"\t"$NF }' | sed 's/<//g; s/>//g'
17  [email protected]
9   [email protected]
8   [email protected]

If you only need email address:

$ cat stack | awk '{ print $NF }' | sed 's/<//g; s/>//g'
[email protected]
[email protected]
[email protected]

FYI: NF gives you the total number of fields in a line

Upvotes: 0

Related Questions