G_T
G_T

Reputation: 1587

Bash regex to check file extensions

I am trying to check the type of a given file and if it is what I expect. It can have one of three extensions .fa, .fasta or .fasta.gz. Looking at other questions I think this should be quite trivial however when I try suggestions they do not work for me.

This is what I have tried, all of which do not match:

#!/bin/bash

test1="abcdef.fa"
test2="ghijkl.fasta"
test3="mnopqr.fasta.gz"
echo "test1: $test1"
echo "test2: $test2"
echo "test3: $test3"

# Attempt 1
if [[ $test1 =~ *.fa|*.fasta|*.fasta.gz ]] &> /dev/null; then printf "Attempt1: Match with $test1\n"; fi
if [[ $test2 =~ *.fa|*.fasta|*.fasta.gz ]] &> /dev/null; then printf "Attempt1: Match with $test2\n"; fi
if [[ $test3 =~ *.fa|*.fasta|*.fasta.gz ]] &> /dev/null; then printf "Attempt1: Match with $test3\n"; fi

# Attempt 2 - do I need to quote the string?
if [[ "$test1" =~ *.fa|*.fasta|*.fasta.gz ]] &> /dev/null; then printf "Attempt2: Match with $test1\n"; fi
if [[ "$test2" =~ *.fa|*.fasta|*.fasta.gz ]] &> /dev/null; then printf "Attempt2: Match with $test2\n"; fi
if [[ "$test3" =~ *.fa|*.fasta|*.fasta.gz ]] &> /dev/null; then printf "Attempt2: Match with $test3\n"; fi

# Attempt 3 - alternative regex
if [[ $test1 =~ .\*.(fa|fasta|fasta.gz) ]] &> /dev/null; then printf "Attempt3: Match with $test1\n"; fi
if [[ $test2 =~ .\*.(fa|fasta|fasta.gz) ]] &> /dev/null; then printf "Attempt3: Match with $test2\n"; fi
if [[ $test3 =~ .\*.(fa|fasta|fasta.gz) ]] &> /dev/null; then printf "Attempt3: Match with $test3\n"; fi

# Attempt 4 - again with the quoted string
if [[ "$test1" =~ .\*.(fa|fasta|fasta.gz) ]] &> /dev/null; then printf "Attempt4: Match with $test1\n"; fi
if [[ "$test2" =~ .\*.(fa|fasta|fasta.gz) ]] &> /dev/null; then printf "Attempt4: Match with $test2\n"; fi
if [[ "$test3" =~ .\*.(fa|fasta|fasta.gz) ]] &> /dev/null; then printf "Attempt4: Match with $test3\n"; fi

# Attempt 5 - put $ on end of regex
if [[ $test1 =~ .\*.(fa|fasta|fasta.gz)$ ]] &> /dev/null; then printf "Attempt5: Match with $test1\n"; fi
if [[ $test2 =~ .\*.(fa|fasta|fasta.gz)$ ]] &> /dev/null; then printf "Attempt5: Match with $test2\n"; fi
if [[ $test3 =~ .\*.(fa|fasta|fasta.gz)$ ]] &> /dev/null; then printf "Attempt5: Match with $test3\n"; fi

# Attempt 6 - again with the quoted string
if [[ "$test1" =~ .\*.(fa|fasta|fasta.gz)$ ]] &> /dev/null; then printf "Attempt6: Match with $test1\n"; fi
if [[ "$test2" =~ .\*.(fa|fasta|fasta.gz)$ ]] &> /dev/null; then printf "Attempt6: Match with $test2\n"; fi
if [[ "$test3" =~ .\*.(fa|fasta|fasta.gz)$ ]] &> /dev/null; then printf "Attempt6: Match with $test3\n"; fi

# Attempt 7 - use double ||
if [[ $test1 =~ .\*.(fa||fasta||fasta.gz) ]] &> /dev/null; then printf "Attempt7: Match with $test1\n"; fi
if [[ $test2 =~ .\*.(fa||fasta||fasta.gz) ]] &> /dev/null; then printf "Attempt7: Match with $test2\n"; fi
if [[ $test3 =~ .\*.(fa||fasta||fasta.gz) ]] &> /dev/null; then printf "Attempt7: Match with $test3\n"; fi

I am close with this:

# Attempt 8 - escape parentheses
if [[ $test1 =~ .\*.\(fa|fasta|fasta.gz\) ]] &> /dev/null; then printf "Attempt8: Match with $test1\n"; fi
if [[ $test2 =~ .\*.\(fa|fasta|fasta.gz\) ]] &> /dev/null; then printf "Attempt8: Match with $test2\n"; fi
if [[ $test3 =~ .\*.\(fa|fasta|fasta.gz\) ]] &> /dev/null; then printf "Attempt8: Match with $test3\n"; fi

However the first test does not work and the output looks like this:

test1: abcdef.fa
test2: ghijkl.fasta
test3: mnopqr.fasta.gz
Attempt8: Match with ghijkl.fasta
Attempt8: Match with mnopqr.fasta.gz

What am I missing?

Upvotes: 1

Views: 570

Answers (4)

chepner
chepner

Reputation: 530970

You can use either regular-expression matching or pattern matching with [[ ... ]].

# regular expression
[[ $test1 =~ \.(fa|fasta|fasta.gz)$ ]]

# pattern match
[[ $test1 = *.@(fa|fasta|fasta.gz) ]]

Regular expressions aren't anchored to either end of the string, so you need to match $ to ensure the extensions actually occur at the end of the string, not just somewhere in the middle. The (...) is a list of alternatives to choose from

Pattern matches are anchored to both ends, so you need the * to match all of the string up to the extension. The @(...) is a list of alternatives to choose from.

Quoting the left-hand operand is optional in both cases.

Upvotes: 1

Jetchisel
Jetchisel

Reputation: 7791

You could try a case statement, something like:

case "$test1" in
  *.fa|*.fasta|*.fasta.gz) printf 'Attempt1: Match with %s\n' "$test1";;
esac

case "$test2" in
  *.fa|*.fasta|*.fasta.gz) printf 'Attempt1: Match with %s\n' "$test2";;
esac

case "$test3" in
  *.fa|*.fasta|*.fasta.gz) printf 'Attempt1: Match with %s\n' "$test3";;
esac

  • See help case

  • See LESS='+/case word in' man bash

Upvotes: 3

konsolebox
konsolebox

Reputation: 75478

=~ is supposed to accept regex patterns and not glob patterns. Try \.(fa|fasta|fasta\.gz)$.

Also you can use extended pattern matching: [[ $test1 == *.@(fa|fasta|fasta.gz) ]]

Upvotes: 1

Philippe
Philippe

Reputation: 26457

It's much easier to define regex in a variable :

#!/usr/bin/env bash

test1="abcdef.fa"
test2="ghijkl.fasta"
test3="mnopqr.fasta.gz"
echo "test1: $test1"
echo "test2: $test2"
echo "test3: $test3"

pattern='\.(fa|fasta|fasta.gz)$'
# Attempt 1
if [[ $test1 =~ $pattern ]] &> /dev/null; then printf "Attempt1: Match with $test1\n"; fi
if [[ $test2 =~ $pattern ]] &> /dev/null; then printf "Attempt1: Match with $test2\n"; fi
if [[ $test3 =~ $pattern ]] &> /dev/null; then printf "Attempt1: Match with $test3\n"; fi

Upvotes: 1

Related Questions