Reputation: 358
Here is test sample file--rime.txt.
1.to count all words in the file.
wc -w rime.txt
4081 rime.txt
awk 'BEGIN{num=0}{split($0, A);n=length(A);num=num+n;}END{print num}' rime.txt
4081
grep -Ec '\w' rime.txt
672
Why the total words is 672 with grep?
How to count it with sed?
2.to count words per line
awk '{split($0, A);print length(A)}' rime.txt
How to do it with sed?
Upvotes: 0
Views: 3199
Reputation: 37404
If you want to use grep
for the job, first form a regexp to resemble a word, I'll just use this: [a-zA-Z'-]
and let your figure out a better one. Then use grep -o
for matching:
-o, --only-matching
Print only the matched (non-empty) parts of a matching line,
with each such part on a separate output line.
And finally count the matches with wc -l
:
$ grep -o [a-zA-Z'-] rime.txt | wc -l
4090
Upvotes: 0
Reputation: 203483
grep is countig lines, not words, and you would never use sed for this because sed is for simple substitutions on individual lines, that is all.
Also, those awk scripts are ridiculous. The correct way to write the first one would be awk '{num+=NF} END{print num+0}'
or with GNU awk awk -v RS='[[:space:]]+' 'END{print NR+0}'
and the second one is just awk '{print NF}'
.
Upvotes: 4
Reputation: 3137
To clarify your doubt on missing words take one small example here -
$cat ff
hello vipin
kumar
good night
Clearly, 3 lines with 5 words.
try with wc -w first-
$wc -w ff
5 ff
and the grep command that you have used -
$grep -Ec '\w' ff
3
In your case Total line count -
$wc -l < file.txt
833
Total blank line count -
$grep '^$' file.txt |wc -l
161
Total non-blank line count -
$grep -v '^$' file.txt |wc -l
672
That is why you are seeing 672 lines.
$echo $(expr 833 - 161)
672
As expert has already mentioned that you shouldn't use sed for this operation and grep \w will give your the line count, not word count.
Upvotes: 1
Reputation: 659
Because it's only counting lines not words. From the man page:
-c, --count Suppress normal output; instead print a count of matching lines for each input file. With the -v, --invert-match option (see below), count non-matching lines.
And as you can see on the link you provided, there are 834 lines and 672 SLOC (Source lines of code), and that last measurement is the one grep uses.
Upvotes: 1