Alistair Colling
Alistair Colling

Reputation: 1563

How can I count the number of words in a directory recursively?

I'm trying to calculate the number of words written in a project. There are a few levels of folders and lots of text files within them.

Can anyone help me find out a quick way to do this?

bash or vim would be good!

Thanks

Upvotes: 12

Views: 13840

Answers (5)

Yeikel
Yeikel

Reputation: 935

Assuming you don't need to recursively count the words and that you want to include all the files in the current directory , you can use a simple approach such as:

wc -l *


10  000292_0
500 000297_0
510 total

If you want to count the words for only a specific extension in the current directory , you could try :

cat *.txt | wc -l

Upvotes: 0

therealneil
therealneil

Reputation: 5310

tldr;

$ find . -type f -exec wc -w {} + | awk '/total/{print $1}' | paste -sd+ | bc

Explanation:

The find . -type f -exec wc -w {} + will run wc -w on all the files (recursively) contained by . (the current working directory). find will execute wc as few times as possible but as many times as is necessary to comply with ARG_MAX --- the system command length limit. When the quantity of files (and/or their constituent lengths) exceeds ARG_MAX, then find invokes wc -w more than once, giving multiple total lines:

$ find . -type f -exec wc -w {} + | awk '/total/{print $0}'
  8264577 total
  654892 total
 1109527 total
 149522 total
 174922 total
 181897 total
 1229726 total
 2305504 total
 1196390 total
 5509702 total
  9886665 total

Isolate these partial sums by printing only the first whitespace-delimited field of each total line:

$ find . -type f -exec wc -w {} + | awk '/total/{print $1}'
8264577
654892
1109527
149522
174922
181897
1229726
2305504
1196390
5509702
9886665

paste the partial sums with a + delimiter to give an infix summation:

$ find . -type f -exec wc -w {} + | awk '/total/{print $1}' | paste -sd+
8264577+654892+1109527+149522+174922+181897+1229726+2305504+1196390+5509702+9886665

Evaluate the infix summation using bc, which supports both infix expressions and arbitrary precision:

$ find . -type f -exec wc -w {} + | awk '/total/{print $1}' | paste -sd+ | bc
30663324

References:

Upvotes: 5

janos
janos

Reputation: 124704

You could find and print all the content and pipe to wc:

find path -type f -exec cat {} \; -exec echo \; | wc -w

Note: the -exec echo \; is needed in case a file doesn't end with a newline character, in which case the last word of one file and the first word of the next will not be separated.

Or you could find and wc and use awk to aggregate the counts:

find . -type f -exec wc -w {} \; | awk '{ sum += $1 } END { print sum }'

Upvotes: 4

miken32
miken32

Reputation: 42700

If there's one thing I've learned from all the questions on SO, it's that a filename with a space will mess you up. This script will work even if you have whitespace in the file names.

#!/usr/bin/env bash

shopt -s globstar
count=0
for f in **/*.txt
do
    words=$(wc -w "$f" | awk '{print $1}')
    count=$(($count + $words))
done
echo $count

Upvotes: 3

karakfa
karakfa

Reputation: 67507

use find the scan the dir tree and wc will do the rest

$ find path -type f | xargs wc -w | tail -1

last line gives the totals.

Upvotes: 14

Related Questions