steve lin
steve lin

Reputation: 109

How to list filenames with regular expression

I have multiple files with different suffix:

file_12
file_34
file_1245
file_4567
file_4

I only want to list the files whose suffix has four digits. I know I could do it like

ls -l file_[0-9][0-9][0-9][0-9]

But is there a better way to do it?

Upvotes: 1

Views: 4216

Answers (5)

drewyupdrew
drewyupdrew

Reputation: 1609

Well that's a glob pattern, but if you want to use regex you can use grep:

ls -l | egrep "file_[0-9]{4}$"

Keep in mind that the beginning of the filename is disregarded, so it would still match a file called foo_file_1245.

 

EDIT: @mklement0 has suggested, and I agree, that this is a better approach if you must use egrep/regex:

ls -l | egrep '\sfile_[0-9]{4}$'

EDIT 2: Realized that the above statement would erroneously match a filename such as someprefix file_1234. You can be extra safe and ensure you're using the egrep on only the last column (filename):

ls -l | awk '{print $9}' | egrep "^file_[0-9]{4}$"

(but frankly OP, at this point, your glob solution is probably faster)

Upvotes: 2

mklement0
mklement0

Reputation: 437833

It depends on what you mean by better, which comes down to a trade-off between efficiency and convenience:

  • If by better you mean more efficient:

    • No: You've already found the most efficient way: globbing pattern
      file_[0-9][0-9][0-9][0-9] is resolved by the shell, in-process and passes the matching filenames to ls -l.
    • Benjamin W., in a comment on the question, helpfully points to an article explaining the differences between shell patterns (globbing patterns) and regular expressions: http://mywiki.wooledge.org/BashGuide/Patterns
  • If by better you mean more convenient:

    • Using regular expressions allows you to use quantifiers (duplication symbols), which simplifies the expression to ^file_[0-9]{4}$ or ^file_\d{4}$, depending on the regex dialect.
    • This comes at the expense of requiring either:
      • a slow shell loop with conditionals using Bash's regex-matching =~ operator - see rob mayoff's helpful answer
      • involving another external utility that understands regular expressions, such as grep -E (egrep) - see drewyupdrew's helpful answer.
        • A slightly more robust version, utilizing only awk as an additional utility:
          ls -l | awk 'NF==9 && $NF ~ "^file_[0-9]{4}$"'

At the end of the day, leaving efficiency considerations aside, neither regex-based solution on balance simplifies the task at hand, given the additional code that is required.

Thus, unless your matching logic is too complex to be modeled with a shell (globbing) pattern (not the case here), I suggest you stick with your original approach.

Upvotes: 3

choroba
choroba

Reputation: 241868

Perl to the rescue:

perl -e 'system "ls", "-l", grep /^file_[0-9]{4}$/, glob "file_*";'

Upvotes: 0

rob mayoff
rob mayoff

Reputation: 385610

No, there's not a particularly better way to do it. The way you're doing it with a shell glob is easy to understand and almost certainly efficient enough.

Here's a worse way to do it: use the =~ operator in the [[ command to match filenames one at a time, like this:

for f in *; do
    [[ $f =~ .*[0-9]{4} ]] && ls -l "$f"
done

Here's another worse way: use the find command:

find -E . -maxdepth 1 -regex '.*[0-9]{4}' -print0 | xargs -0 ls -l

Upvotes: 2

Kaizhe Huang
Kaizhe Huang

Reputation: 1006

ls -l | grep file | grep -E '[0-9]{4}'

Upvotes: -1

Related Questions