Reputation: 1314

Parts of a match in regular expression with egrep

I was wondering if, with egrep ((GNU grep) 2.5.1), I can select a part of the matched text, something like:

grep '^([a-zA-Z.-]+)[0-9]+' ./file.txt

So I get only the part which matched, between the brackets, something like

house.com

Instead of the whole line like I usually get:

house.com112

Assuming I have a line with house.com112 in my file.txt.

(Actually this regular expression is just an example I just want to know if I can print only a part of the whole line.)

I do know in some languages, such as PHP, Perl or even AWK I can, but I do not know if I can with egrep.

Thank you in advance!

Upvotes: 6

Answers (4)

LF-DevJourney

Reputation: 28549

Use lookahead of regular

$ echo 'house.com112' | grep -Po '([a-zA-Z.]+)(?=\d+)'
house.com

Upvotes: 2

David Kanarek

Reputation: 12613

The first part of your regex is more general than the second half, and since + is greedy, the second [0-9]+ will ~~never match anything~~ only match the last digit (thanks Paul). If you can make your first half more specific (e.g. if you know it will end in a TLD) you could do it.

There's an amazingly cool tool called ack which is basically grep with perl regexs. I'm not sure if it's possible to use in your case, but if you can do what you want in perl, you can do it with ack.

Edit:

Why not just drop the end of the regex? Are there false positives if you do that? If you, you could pipe the results to egrep again with the first half of the regex only.

This seems to be what you are asking about: Also, on the off chance that you don't know about it, the -o flag will output only the matched portion of a given line.

Upvotes: 3

ghostdog74

Reputation: 342609

you might want to try the -o, -w flags in grep. egrep is "deprecated" , so use grep -E.

$ echo "test house.com house.com112"| grep -Eow "house.com"
house.com

The basic idea is to go through each word and test for equality.

$ echo "test house.com house.com112"| awk '{for(i=1;i<=NF;i++){ if($i=="house.com") print $i}}'
house.com

Upvotes: 3

Mark Byers

Reputation: 838666

Use sed to modify the result after grep has found the lines that match:

grep '^[a-zA-Z.-]+[0-9]+' ./file.txt | sed 's/[0-9]\+$//'

Or if you want to stick with only grep, you can use grep with the -o switch instead of sed:

grep '^[a-zA-Z.-]+[0-9]+' ./file.txt | grep -o '[a-zA-Z.-]+'

Upvotes: 11

Parts of a match in regular expression with egrep

Answers (4)

Related Questions