AMM
AMM

Reputation: 17930

In grep on Ubuntu, how can I display only the string that matched the regular expression?

I am basically grepping with a regular expression on. In the output, I would like to see only the strings that match my reg exp.

In a bunch of XML files (mostly they are single-line files with huge amounts of data in a line), I would like to get all the words that start with MAIL_.

Also, I would like the grep command on the shell to give only the words that matched and not the entire line (which is the entire file in this case).

How do I do this?

I have tried

grep -Gril MAIL_* .
grep -Grio MAIL_* .
grep -Gro MAIL_* .

Upvotes: 14

Views: 33962

Answers (4)

Catalin Iacob
Catalin Iacob

Reputation: 644

From your comment to Thor's answer it seems you also want to distinguish if the MAIL_.* text is a text node or an attribute, not just to isolate it whenever it appears in the XML document. Grep cannot parse XML, you need a proper XML parser for that.

A command line xml parser is xmlstarlet. It is packaged in Ubuntu.

Using it on this example file example file:

$ cat test.xml 
<some_root>
    <test a="MAIL_as_attribute">will be printed if you want matching attributes</test>
    <bar>MAIL_as_text will be printed if you want matching text nodes</bar>
    <MAIL_will_not_be_printed>abc</MAIL_will_not_be_printed>
</some_root>

For selecting text nodes you can use:

$ xmlstarlet sel -t -m '//*' -v 'text()' -n test.xml | grep -Eo 'MAIL_[^[:space:]]*'
MAIL_as_text

And for selecting attributes:

$ xmlstarlet sel -t -m '//*[@*]' -v '@*' -n test.xml | grep -Eo 'MAIL_[^[:space:]]*'
MAIL_as_attribute

Brief explanations:

  • //* is an XPath expression that selects all elements in the document and text() outputs the value of their children text nodes, therefore everything except text nodes gets filtered out
  • //*[@*] is an XPath expression that selects all attributes in the document and then @* outputs their value

Upvotes: 0

banx
banx

Reputation: 4416

Try the following command

grep -Eo 'MAIL_[[:alnum:]_]*'

Upvotes: 6

thor
thor

Reputation: 2284

First of all, with GNU grep that is installed with Ubuntu, -G flag (use basic regexp) is the default, so you can omit it, but, even better, use extended regexp with -E.

-r flag means recursive search within files of a directory, this is what you need.

And, you are right to use -o flag to print matching part of a line. Also, to omit file names you will need a -h flag.

The only mistake you made is the regular expression itself. You missed character specification before *. Your command should look like this:

grep -Ehro 'MAIL_[^[:space:]]*' .

Sample output (not recursive):

$ echo "Some garbage MAIL_OPTION comes MAIL_VALUE here" | grep -Eho 'MAIL_[^[:space:]]*'
MAIL_OPTION
MAIL_VALUE

Upvotes: 18

chocolate_jesus
chocolate_jesus

Reputation: 101

grep -o or --only-matching

outputs only the matching text instead of complete lines but the problem could be your regex that's not restrictive or greedy enough and actually matches the whole file.

Upvotes: 2

Related Questions