Reputation: 467
I have a text file with irregular structure like following
first_name1 last_name1 designation1 email1 phone_number1
first_name2 last_name2 designation2 email2
first_name3 last_name3 designation3 email3 phone_number3 address3
As you see email could be the last column, second last column or the third last column. This means one simply cannot use $NF to get email. My goal is to get email address wherever it is on the line and then extract the portion before @ so for instance email1 = [email protected] then I want to extract foobar. How can i write an awk query to extract first portion of the email address. I tried this but it is looking for exact match. How can i make it into Regex to get the job done.
awk '{for(i=1;i<=NF;i++){ if($i=="[email protected]"){print $i} } }' users.txt
Upvotes: 0
Views: 70
Reputation: 7161
You are comparing $i
to a string "[email protected]"
, so yes of course this will only make an exact comparison. What it seems you are looking for is whether or not $i
matches (~
) a regular expression (/.../
instead of "..."
), then tailor the regex to your needs. Try something like:
awk '{for(i=1;i<=NF;++i){if ($i ~ /.+@.+/){sub(/@.*$/, "", $i); print $i; next}}}'
The regex /.+@.+/
matches a string with a @
in it, and some non-empty thing before it and after it. This will not match, for example @foobar
or foobar@
, or just @
. You might want to consider using something more like /.+@.+\..+/
which would match (something)@
(something).
(something) since domain names usually have a .
in them. You can tailor this regex to be more specific, if you wish.
The sub(/@.*$/, "", $i)
means to substitute in $i
everything after (and including) the first @
until the end of the line ($
) with an empty string ""
, thus stripping out the part before the @
(i.e. the username). The print $i
prints it, and the next
moves on to the next line (skipping any remaining fields for the current record).
Upvotes: 2
Reputation: 7081
I don't know awk at all but I looked the regex reference up and this should be supported: \b([^ ]*@.*?)($|[^\w@.])
in which group 1 matches the email. This just search for something after a word boundary that contains @
. The match ends at the next non word character, excluding @
and .
.
Upvotes: 0