How to grep for some specific parts, from a file?

Question

I need to extract some specifiek parts from a 'very big > 3GB' text file.

,(1,'test@hotmail.com',0,0,1,1,0,0,1),
 (2,'test4@hotmail.com',1,0,3,1,7,0,1),
 (3,'test2@live.com',0,0,0,1,0,0,1),
 (4,'test5@hotmail.com',1,0,7,1,1,1,3),
 (5,'test3@hotmail.com',0,0,3,1,1,0,1),
 (6,'test6@hotmail.com',1,0,5,1,6,1,1),

And I need 'first field, email, third field' so (without the '') and by line as below..

1,test@hotmail.com,0

2,test4@hotmail.com,1

3,test2@live.com,0

etc..

And if possible I want extract the domain names (like 1,test@hotmail.com,hotmail.com,0 )

I can extract the emails with the following:

grep -o -E '\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b' test

and I tried a lot more... like egrep -o -E '([^),(^]+)' test, and set

I hope someone get help me out!

luoluo · Accepted Answer

You may use tr to split the very long line to multi lines.

Then use tr to remove the special chars like ().

Finally, use AWK to print output the expected columns.

tr ")('" " " < file | tr -d "[ ]" |awk -F"," '{print $2","$3","$4}'

UPDATE

Then just split the email or hostname would solve your problem.

tr ")" " " < file | tr -d "[ (']" |awk -F"," '{ split($3, a, "@"); print $2","$3","a[2]","$4;}'

FINAL UPDATE

Add a check, only print the legal lines.

tr ")" " " < file | tr -d "[ (']" |awk -F"," '{ split($3, a, "@"); if (NF>2) {print $2","$3","a[2]","$4;}}'

OUTPUT

1,t@hotmail.com,hotmail.com,0
2,test4@hotmail.com,hotmail.com,1
3,test2@live.com,live.com,0

How to grep for some specific parts, from a file?

Answers (1)

Related Questions