Reputation: 16525
Is there a bash command which counts the number of files that match a pattern?
For example, I want to get the count of all files in a directory which match this pattern: log*
Upvotes: 314
Views: 219579
Reputation: 5391
This simple one-liner should work in any shell, not just bash:
ls -1q log* | wc -l
ls -1q
will give you one line per file, even if they contain whitespace or special characters such as newlines.
The output is piped to wc -l
which counts the number of lines.
Upvotes: 411
Reputation: 19545
This can be done with standard POSIX shell grammar.
Here is a simple count_entries
function:
#!/usr/bin/env sh
count_entries()
{
# Emulating Bash nullglob
# If argument 1 is not an existing entry
if [ ! -e "$1" ]
# argument is a returned pattern
# then shift it out
then shift
fi
echo $#
}
for a compact definition:
count_entries(){ [ ! -e "$1" ]&&shift;echo $#;}
Featured POSIX compatible file counter by type:
#!/usr/bin/env sh
count_files()
# Count the file arguments matching the file operator
# Synopsys:
# count_files operator FILE [...]
# Arguments:
# $1: The file operator
# Allowed values:
# -a FILE True if file exists.
# -b FILE True if file is block special.
# -c FILE True if file is character special.
# -d FILE True if file is a directory.
# -e FILE True if file exists.
# -f FILE True if file exists and is a regular file.
# -g FILE True if file is set-group-id.
# -h FILE True if file is a symbolic link.
# -L FILE True if file is a symbolic link.
# -k FILE True if file has its `sticky' bit set.
# -p FILE True if file is a named pipe.
# -r FILE True if file is readable by you.
# -s FILE True if file exists and is not empty.
# -S FILE True if file is a socket.
# -t FD True if FD is opened on a terminal.
# -u FILE True if the file is set-user-id.
# -w FILE True if the file is writable by you.
# -x FILE True if the file is executable by you.
# -O FILE True if the file is effectively owned by you.
# -G FILE True if the file is effectively owned by your group.
# -N FILE True if the file has been modified since it was last read.
# $@: The files arguments
# Output:
# The number of matching files
# Return:
# 1: Unknown file operator
{
operator=$1
shift
case $operator in
-[abcdefghLkprsStuwxOGN])
for arg; do
# If file is not of required type
if ! test "$operator" "$arg"; then
# Shift it out
shift
fi
done
echo $#
;;
*)
printf 'Invalid file operator: %s\n' "$operator" >&2
return 1
;;
esac
}
count_files "$@"
Example usages:
count_files -f log*.txt
count_files -d datadir*
Alternate count non-directory entries without a loop:
#!/bin/sh
# Creates strings of as many dots as expanded arguments
# dotted string for entries matching star pattern
star=$(printf '%.0s.' ./*)
# dotted string for entries matching star slash pattern (directories)
star_dir=$(printf '%.0s.' ./*/)
# dotted string for entries matching dot star pattern
dot_star=$(printf '%.0s.' ./.*)
# dotted string for entries matching dot star slash pattern (directories)
dot_star_dir=$(printf '%.0s.' ./.*/)
# Print pattern matches count excluding directories matches
printf 'Files count: %d\n' $((
${#star} - ${#star_dir} +
${#dot_star} - ${#dot_star_dir}
))
Upvotes: 2
Reputation: 33358
For a recursive search:
find . -type f -name '*.log' -printf x | wc -c
wc -c
will count the number of characters in the output of find
, while -printf x
tells find
to print a single x
for each result. This avoids any problems with files with odd names which contain newlines etc.
For a non-recursive search, do this:
find . -maxdepth 1 -type f -name '*.log' -printf x | wc -c
Upvotes: 88
Reputation: 23231
Here is a generic Bash function you can use in your scripts.
# @see https://stackoverflow.com/a/11307382/430062
function countFiles {
shopt -s nullglob
logfiles=($1)
echo ${#logfiles[@]}
}
FILES_COUNT=$(countFiles "$file-*")
Upvotes: 1
Reputation: 305
(not enough reputation to comment)
This is BUGGY:
ls -1q some_pattern | wc -l
If shopt -s nullglob
happens to be set, it prints the number of ALL regular files, not just the ones with the pattern (tested on CentOS-8 and Cygwin). Who knows what other meaningless bugs does ls
have?
This is CORRECT and much faster:
shopt -s nullglob; files=(some_pattern); echo ${#files[@]};
It does the expected job.
0.006
on CentOS, and 0.083
on Cygwin (in case it is used with care).
0.000
on CentOS, and 0.003
on Cygwin.
Upvotes: 12
Reputation: 1743
To count everything just pipe ls to word count line:
ls | wc -l
To count with pattern, pipe to grep first:
ls | grep log | wc -l
Upvotes: -3
Reputation: 1428
I've given this answer a lot of thought, especially given the don't-parse-ls stuff. At first, I tried
<WARNING! DID NOT WORK>
du --inodes --files0-from=<(find . -maxdepth 1 -type f -print0) | awk '{sum+=int($1)}END{print sum}'
</WARNING! DID NOT WORK>
which worked if there was only a filename like
touch $'w\nlf.aa'
but failed if I made a filename like this
touch $'firstline\n3 and some other\n1\n2\texciting\n86stuff.jpg'
I finally came up with what I'm putting below. Note I was trying to get a count of all files in the directory (not including any subdirectories). I think it, along with the answers by @Mat and @Dan_Yard , as well as having at least most of the requirements set out by @mogsie (I'm not sure about memory.) I think the answer by @mogsie is correct, but I always try to stay away from parsing ls
unless it's an extremely specific situation.
awk -F"\0" '{print NF-1}' < <(find . -maxdepth 1 -type f -print0) | awk '{sum+=$1}END{print sum}'
More readably:
awk -F"\0" '{print NF-1}' < \
<(find . -maxdepth 1 -type f -print0) | \
awk '{sum+=$1}END{print sum}'
This is doing a find specifically for files, delimiting the output with a null character (to avoid problems with spaces and linefeeds), then counting the number of null characters. The number of files will be one less than the number of null characters, since there will be a null character at the end.
To answer the OP's question, there are two cases to consider
1) Non-recursive search:
awk -F"\0" '{print NF-1}' < \
<(find . -maxdepth 1 -type f -name "log*" -print0) | \
awk '{sum+=$1}END{print sum}'
2) Recursive search. Note that what's inside the -name
parameter might need to be changed for slightly different behavior (hidden files, etc.).
awk -F"\0" '{print NF-1}' < \
<(find . -type f -name "log*" -print0) | \
awk '{sum+=$1}END{print sum}'
If anyone would like to comment on how these answers compare to those I've mentioned in this answer, please do.
Note, I got to this thought process while getting this answer.
Upvotes: 2
Reputation: 2090
You can use the -R option to find the files along with those inside the recursive directories
ls -R | wc -l // to find all the files
ls -R | grep log | wc -l // to find the files which contains the word log
you can use patterns on the grep
Upvotes: 4
Reputation: 4192
You can define such a command easily, using a shell function. This method does not require any external program and does not spawn any child process. It does not attempt hazardous ls
parsing and handles “special” characters (whitespaces, newlines, backslashes and so on) just fine. It only relies on the file name expansion mechanism provided by the shell. It is compatible with at least sh, bash and zsh.
The line below defines a function called count
which prints the number of arguments with which it has been called.
count() { echo $#; }
Simply call it with the desired pattern:
count log*
For the result to be correct when the globbing pattern has no match, the shell option nullglob
(or failglob
— which is the default behavior on zsh) must be set at the time expansion happens. It can be set like this:
shopt -s nullglob # for sh / bash
setopt nullglob # for zsh
Depending on what you want to count, you might also be interested in the shell option dotglob
.
Unfortunately, with bash at least, it is not easy to set these options locally. If you don’t want to set them globally, the most straightforward solution is to use the function in this more convoluted manner:
( shopt -s nullglob ; shopt -u failglob ; count log* )
If you want to recover the lightweight syntax count log*
, or if you really want to avoid spawning a subshell, you may hack something along the lines of:
# sh / bash:
# the alias is expanded before the globbing pattern, so we
# can set required options before the globbing gets expanded,
# and restore them afterwards.
count() {
eval "$_count_saved_shopts"
unset _count_saved_shopts
echo $#
}
alias count='
_count_saved_shopts="$(shopt -p nullglob failglob)"
shopt -s nullglob
shopt -u failglob
count'
As a bonus, this function is of a more general use. For instance:
count a* b* # count files which match either a* or b*
count $(jobs -ps) # count stopped jobs (sh / bash)
By turning the function into a script file (or an equivalent C program), callable from the PATH, it can also be composed with programs such as find
and xargs
:
find "$FIND_OPTIONS" -exec count {} \+ # count results of a search
Upvotes: 6
Reputation: 206689
You can do this safely (i.e. won't be bugged by files with spaces or \n
in their name) with bash:
$ shopt -s nullglob
$ logfiles=(*.log)
$ echo ${#logfiles[@]}
You need to enable nullglob
so that you don't get the literal *.log
in the $logfiles
array if no files match. (See How to "undo" a 'set -x'? for examples of how to safely reset it.)
Upvotes: 73
Reputation: 4156
Lots of answers here, but some don't take into account
-l
)*.log
instead of log*
logs
that matches log*
)Here's a solution that handles all of them:
ls 2>/dev/null -Ubad1 -- log* | wc -l
Explanation:
-U
causes ls
to not sort the entries, meaning it doesn't need to load the entire directory listing in memory-b
prints C-style escapes for nongraphic characters, crucially causing newlines to be printed as \n
.-a
prints out all files, even hidden files (not strictly needed when the glob log*
implies no hidden files)-d
prints out directories without attempting to list the contents of the directory, which is what ls
normally would do-1
makes sure that it's on one column (ls does this automatically when writing to a pipe, so it's not strictly necessary)2>/dev/null
redirects stderr so that if there are 0 log files, ignore the error message. (Note that shopt -s nullglob
would cause ls
to list the entire working directory instead.)wc -l
consumes the directory listing as it's being generated, so the output of ls
is never in memory at any point in time.--
File names are separated from the command using --
so as not to be understood as arguments to ls
(in case log*
is removed)The shell will expand log*
to the full list of files, which may exhaust memory if it's a lot of files, so then running it through grep is be better:
ls -Uba1 | grep ^log | wc -l
This last one handles extremely large directories of files without using a lot of memory (albeit it does use a subshell). The -d
is no longer necessary, because it's only listing the contents of the current directory.
Upvotes: 92
Reputation: 7679
Here is my one liner for this.
file_count=$( shopt -s nullglob ; set -- $directory_to_search_inside/* ; echo $#)
Upvotes: 10
Reputation: 167
The accepted answer for this question is wrong, but I have low rep so can't add a comment to it.
The correct answer to this question is given by Mat:
shopt -s nullglob
logfiles=(*.log)
echo ${#logfiles[@]}
The problem with the accepted answer is that wc -l counts the number of newline characters, and counts them even if they print to the terminal as '?' in the output of 'ls -l'. This means that the accepted answer FAILS when a filename contains a newline character. I have tested the suggested command:
ls -l log* | wc -l
and it erroneously reports a value of 2 even if there is only 1 file matching the pattern whose name happens to contain a newline character. For example:
touch log$'\n'def
ls log* -l | wc -l
Upvotes: 13
Reputation: 4156
If you have a lot of files and you don't want to use the elegant shopt -s nullglob
and bash array solution, you can use find and so on as long as you don't print out the file name (which might contain newlines).
find -maxdepth 1 -name "log*" -not -name ".*" -printf '%i\n' | wc -l
This will find all files that match log* and that don't start with .*
— The "not name .*" is redunant, but it's important to note that the default for "ls" is to not show dot-files, but the default for find is to include them.
This is a correct answer, and handles any type of file name you can throw at it, because the file name is never passed around between commands.
But, the shopt nullglob
answer is the best answer!
Upvotes: 8
Reputation: 18574
ls -1 log* | wc -l
Which means list one file per line and then pipe it to word count command with parameter switching to count lines.
Upvotes: 0