awk finding a column and trimming

Question

I have a text file with irregular structure like following

first_name1 last_name1 designation1 email1 phone_number1
first_name2 last_name2 designation2 email2
first_name3 last_name3 designation3 email3 phone_number3 address3

As you see email could be the last column, second last column or the third last column. This means one simply cannot use $NF to get email. My goal is to get email address wherever it is on the line and then extract the portion before @ so for instance email1 = foobar@dept.company.com then I want to extract foobar. How can i write an awk query to extract first portion of the email address. I tried this but it is looking for exact match. How can i make it into Regex to get the job done.

awk '{for(i=1;i<=NF;i++){ if($i=="foobar@dept.company.com"){print $i} } }' users.txt

e0k · Accepted Answer

You are comparing $i to a string "foobar@dept.company.com", so yes of course this will only make an exact comparison. What it seems you are looking for is whether or not $i matches (~) a regular expression (/.../ instead of "..."), then tailor the regex to your needs. Try something like:

awk '{for(i=1;i<=NF;++i){if ($i ~ /.+@.+/){sub(/@.*$/, "", $i); print $i; next}}}'

The regex /.+@.+/ matches a string with a @ in it, and some non-empty thing before it and after it. This will not match, for example @foobar or foobar@, or just @. You might want to consider using something more like /.+@.+\..+/ which would match (something)@(something).(something) since domain names usually have a . in them. You can tailor this regex to be more specific, if you wish.

The sub(/@.*$/, "", $i) means to substitute in $i everything after (and including) the first @ until the end of the line ($) with an empty string "", thus stripping out the part before the @ (i.e. the username). The print $i prints it, and the next moves on to the next line (skipping any remaining fields for the current record).

awk finding a column and trimming

Answers (2)

Related Questions