Reputation: 3179
I am new to linux. I have a directory in linux with approx 250,000 files I need to find count of number of files matching a pattern.
I tried using following command :
ls -1 20061101-20131101_kh5x7tte9n_2010_* | wc -l
I got the following error message:
-bash: /bin/ls: Argument list too long
0
Please help. Thanks in advance
Upvotes: 43
Views: 75084
Reputation: 12728
First of all it is better not to use ls
according to this article !!!
and this problem can be solved in many ways. I will list some of the most elegant ones that come to my mind.
count=$(printf '%s\n' *pattern* | wc -l)
#or
count=$(shopt -s nullglob; files=(*pattern*); echo ${#files[@]})
#or
count=$(file *pattern* | wc -l)
#or
count=$(stat -c "%n" *pattern* | wc -l)
#or
count=$(du -a *pattern* | wc -l)
#or
count=$(echo *pattern* | wc -w)
but last one gives the wrong number when the file names contain whitespace.
Upvotes: 0
Reputation: 289755
It might be better to use find
for this:
find . -name "pattern_*" -printf '.' | wc -m
In your specific case:
find . -maxdepth 1 -name "20061101-20131101_kh5x7tte9n_2010_*" -printf '.' | wc -m
find
will return a list of files matching the criteria. -maxdepth 1
will make the search to be done just in the path, no subdirectories (thanks Petesh!). -printf '.'
will print a dot for every match, so that names with new lines won't make wc -m
break.
Then wc -m
will indicate the number of characters which will match the number of files.
Performance comparation of two possible options:
Let's create 10 000 files with this pattern:
$ for i in {1..10000}; do touch 20061101-20131101_kh5x7tte9n_201_$i; done
And then compare the time it takes to get the result with ls -1 ...
or find ...
:
$ time find . -maxdepth 1 -name "20061101-20131101_kh5x7tte9n_201_*" | wc -m
10000
real 0m0.034s
user 0m0.017s
sys 0m0.021s
$ time ls -1 | grep 20061101-20131101_kh5x7tte9n_201 | wc -m
10000
real 0m0.254s
user 0m0.245s
sys 0m0.020s
find
is x5 times faster! But if we use ls -1f
(thanks Petesh again!), then ls
is even faster than find
:
$ time ls -1f | grep 20061101-20131101_kh5x7tte9n_201 | wc -m
10000
real 0m0.023s
user 0m0.020s
sys 0m0.012s
Upvotes: 74
Reputation: 1189
If you are attempting to do this in the command line on a Mac you will soon find out that find
does not support the -printf
option.
To accomplish the same result as the solution proposed by fedorqui-supports-monica try this:
find . -name "pattern_*" -exec stat -f "." {} \; | wc -l
This will find all files matching the pattern you entered, print a .
for each of them in a newline, then finally count the number of lines and output that number.
To limit your search depth to the current directory, add -maxdepth 1
to the command like so:
find . -maxdepth 1 -name "196288.*" -exec stat -f "." {} \; | wc -l
Upvotes: 5
Reputation: 189417
You should generally avoid ls
in scripts and in fact, performing the calculation in a shell function will avoid the "argument list too long" error because there is no exec
boundary and so the ARGV_MAX
limit doesn't come into play.
number_of_files () {
if [ -e "$1" ]; then
echo "$#"
else
echo 0
fi
}
The conditional guards against the glob not being expanded at all (which is the default out of the box; in Bash, you can shopt -s nullglob
to make wildcards which don't match any files get expanded into the empty string).
Try it:
number_of_files 20061101-20131101_kh5x7tte9n_2010_*
Upvotes: 1
Reputation: 11
ls -1 | grep '20061101-20131101_kh5x7tte9n_2010_*' | wc -l
Previous answer did not included quotes around search criteria neither * wildcard.
Upvotes: -3
Reputation: 5998
you got "argument too long" because shell expands your pattern to the list of files. try:
find -maxdepth 1 -name '20061101-20131101_kh5x7tte9n_2010_*' |wc -l
please pay attention - pattern is enclosed in quotes to prevent shell expansion
Upvotes: 6