user81371
user81371

Reputation: 101

Count the number of whitespaces in a file

File test

musically us
challenged a goat that day
spartacus was his name
ba ba ba blacksheep

All give me 4, when I expect 11

grep --version -> grep (BSD grep) 2.5.1-FreeBSD

Running this on OSX Sierra 10.12

Repeating spaces should not be counted as one space.

Upvotes: 1

Views: 2698

Answers (4)

Barmar
Barmar

Reputation: 782345

The -c option counts the number of lines that match, not individual matches. Use grep -o and then pipe to wc -l, which will count the number of lines.

grep -o ' ' test | wc -l

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 204488

Just use awk:

$ awk -v RS=' ' 'END{print NR-1}' file
11

or if you want to handle empty files gracefully:

$ awk -v RS=' ' 'END{print NR - (NR?1:0)}' /dev/null
0

Upvotes: 0

l'L'l
l'L'l

Reputation: 47284

tr is generally better for this (in most cases):

tr -d -C ' ' <file | wc -c

The grep solution relies on the fact that the output of grep -o is newline-separated — it will fail miserably for example in the following type of circumstance where there might be multiple spaces:

v='fifteen-->               <--spaces'

echo "$v" | grep -o -E ' +' | wc -l

echo "$v" | tr -d -C ' ' | wc -c

grep only returns 1, when it should be 15.

EDIT: If you wanted to count multiple characters (eg. TAB and SPACE) you could use:

tr -dC $'[ \t]' <<< $'one \t' | wc -c

Upvotes: 2

George Vasiliou
George Vasiliou

Reputation: 6345

If you are open to tricks and alternatives you might like this one:

$ awk '{print --NF}' <(tr -d '\n' <file)
11

Above solution will count "whitespace" between words. As a result for a string of 'fifteen--> <--spaces' awk will measure 1, like grep.

If you need to count actual single spaces you can use this :

$ awk -F"[ ]" '{print --NF}' <<<"fifteen-->               <--spaces"
15
$ awk -F"[ ]" '{print --NF}' <<<"  2  4  6  8  10"
10
$ awk -F"[ ]" '{print --NF}' <(tr -d '\n' <file)
11

One step forward, to count single spaces and tabs:

$ awk -F"[ ]|\t" '{print --NF}' <(echo -e "  2  4  6  8  10\t12  14")
13

Upvotes: 3

Related Questions