user3732793
user3732793

Reputation: 1939

bash grep text within squared brackets

I try to grep a text from a log file on a linux bash.The text is within two square brackets.

e.g. in:

32432423 jkhkjh [234] hkjh32 2342342

I am searching 234.

usually that should find it

 \[(.*?)\]

but not with

|grep \[(.*?)\]

what is the correct way to do the regular expression search with grep

Upvotes: 6

Views: 13253

Answers (4)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626690

To grep all values between square brackets including the brackets you may use a POSIX BRE based grep command like

grep -o '\[[^][]*]' file

...and BONUS solutions of the same kind:

grep -o '<[^<>]*>' file # Extracting all strings between angle brackets
grep -o '([^()]*)' file # Extracting all strings between parentheses
grep -o '{[^{}]*}' file # Extracting all strings between curly braces
grep -o '"[^"]*"'  file # Extracting all strings between double quotes
grep -o "'[^']*'"  file # Extracting all strings between single quotes

See the online grep demo. The -o option makes grep output matched substrings only, not whole lines, and the \[[^][]*] pattern matches a [, then 0 or more occurrences of any chars but [ and ] (see the negated [^][]* bracket expression), and then a ].

If you need to get the value inside square brackets excluding the square brackets, you can use a PCRE regex based grep commands like

grep -oP '\[\K[^][]*(?=])' file

See another online demo

The \[\K[^][]*(?=]) pattern matches

  • \[ - a [ char
  • \K - a match reset operator that discards the text matched so far from the match memory buffer
  • [^][]* - 0 or more chars other than ] and [
  • (?=]) - a positive lookahead that requires a ] char immediately to the right of the current location.

Upvotes: 7

James Brown
James Brown

Reputation: 37394

I prefer \\[[^]]*] (that's: \\[ [ ^] ]* ], ie. anything-but-right-square-brackets in square brackets) over \\[.*] because of greediness:

$ grep -o \\[.*] <<<"[this] and that too]"
[this] and that too]

vs.

$ grep -o \\[[^]]*] <<<"[this] and that too]"
[this]

Then again grep is not the tool for everything (it was g/re/p after all). If you just want what's inside the square brackets, I'd use sed for that:

$ sed 's/.*\[\([^]]*\)].*/\1/' foo
234

ie. replace-everything-with-what's-in-parenthesis...sies.

Upvotes: 1

chepner
chepner

Reputation: 530920

[ has special meaning to both the shell and grep, so you need to quote it twice. The backslashes prevent grep from treating them as part of a bracket expression; quoting the entire thing prevents the shell from trying to expand the regular expression as a pattern before passing it to grep.

... | grep '\[(.*?)\]'

In your attempt, the shell stripped the backslashes after they were to force the shell to treat them literally, it was approximately to ... | grep '[(.*?)]'.

Upvotes: 0

fedorqui
fedorqui

Reputation: 289505

You can look for an opening bracket and clear with the \K escape sequence. Then, match up to the closing bracket:

$ grep -Po '\[\K[^]]*' <<< "32432423 jkhkjh [234] hkjh32 2342342"
234

Note you can omit the -P (Perl extended regexp) by saying:

$ grep -o '\[.*]' <<< "32432423 jkhkjh [234] hkjh32 2342342"
[234]

However, as you see, this prints the brackets also. That's why it is useful to have -P to perform a look-behind and look-after.

You also mention ? in your regexp. Well, as you already know, *? is to have a regex match behave in a non-greedy way. Let's see an example:

$ grep -Po '\[.*?]' <<< "32432423 jkhkjh [23]4] hkjh32 2342342"
[23]
$ grep -Po '\[.*]' <<< "32432423 jkhkjh [23]4] hkjh32 2342342"
[23]4]

With .*?, in [23]4] it matches [23]. With just .*, it matches up to the last ] hence getting [23]4]. This behaviour just works with the -P option.

Upvotes: 9

Related Questions