guilin 桂林
guilin 桂林

Reputation: 17422

how to write this grep regex

if [ '`echo "$url" | grep (\.tar\.gz|\.tar\.bz2|\.zip|\.rar|\.7z)$`' ] ; then

syntax error, I just want to check file extension.

Upvotes: 2

Views: 2481

Answers (3)

Brad Christie
Brad Christie

Reputation: 101594

Try this instead:

grep -E '\.(tar\.gz|tar\.bz2|zip|rar|7z)$'

Upvotes: 0

moinudin
moinudin

Reputation: 138317

First of all, you need to remove the '' as otherwise the test is just a string that always evaluates to true. You need to put the regex in quotes as parentheses are interpreted by bash. You also need to use egrep (equivalent to grep -E).

if [ `echo "$url" | egrep "(\.tar\.gz|\.tar\.bz2|\.zip|\.rar|\.7z)$"` ] ; then

You can also shorten the regex by factoring the . out of the group and grouping tar.{gz,bz2}:

if [ `echo "$url" | egrep "\.(tar\.(gz|bz2)|zip|rar|7z)$"` ] ; then

For future, please take note of the error bash gives you which tells you quite a lot: bash: syntax error near unexpected token ( suggests that the error is around the (, which I've just shown you is exactly where the error lies.

Jonathan's answer offers more tips on improving the test.

Upvotes: 4

Jonathan Leffler
Jonathan Leffler

Reputation: 753475

Given:

if [ '`echo "$url" | grep (\.tar\.gz|\.tar\.bz2|\.zip|\.rar|\.7z)$`' ] ; then

Of itself, this isn't a syntax error - any syntax error is probably nearby.

On the other hand, this doesn't do what you want, either. The string between the square brackets is single-quoted; it is itself. The test checks whether the string is empty (it isn't) and goes on to execute the code in the then clause.

You need to use something like:

if [ $(echo "$url" | grep -E '\.(tar\.gz|tar\.bz2|zip|rar|7z|tgz)$' ) ] ; then
  • Use '$(...)' in preference to back-ticks.
  • Use 'grep -E' to activate extended regular expressions.
  • Factor out the leading '.' of the extensions.
  • Remember that '.tgz' is a valid (although rare) extension for gzipped tar files.

And then, as Dennis points out in a comment, you can observe that it is not necessary to use the test command or command substitution at all:

if echo "$url" | grep -E '\.(tar\.gz|tar\.bz2|zip|rar|7z|tgz)$' >/dev/null ; then

This checks the exit status of the pipeline, which is the exit status from grep, which will be 0 (success) if one of the suffixes was recognized and 1 (failure) if none of them was. And then, if this is Bash that we're using, you can avoid the pipeline too:

if grep -E '\.(tar\.gz|tar\.bz2|zip|rar|7z|tgz)$' <<< "$url" >/dev/null ; then

And we can also avoid using a second process at all by rewriting the code to use a case statement:

case "$url" in
(*.tar.gz|*.tar.bz2|*.zip|*.rar|*.7z|*.tgz|*.xz)
    # Do what was in the 'then' clause
    ;;
(*) # Do what was in the 'else' clause
    ;;
esac
  • Note that .xz is also a compression scheme that you might encounter.

Upvotes: 3

Related Questions