Reputation: 31
I try to count the number of numbers and letters in my file in Bash.
I know that I can use wc -c file
to count the number of characters but how can I fix it to only letters and secondly numbers?
Upvotes: 3
Views: 5930
Reputation: 2496
To count the number of letters and numbers you can combine grep
with wc
:
grep -Eo '[a-z]' myfile | wc -w
grep -Eo '[0-9]' myfile | wc -w
With little bit of tweaking you can modify it to count numbers or alphabetic words or alphanumeric words like this,
grep -Eo '[a-z]+' myfile | wc -w
grep -Eo '[0-9]+' myfile | wc -w
grep -Eo '[[:alnum:]]+' myfile | wc -w
Upvotes: 1
Reputation: 312219
You can use tr
to preserve only alphanumeric characters by combining the the -c
(complement) and -d
(delete) flags. From there on, it's just a question of some piping:
$ cat myfile.txr | tr -cd [:alnum:] | wc -c
Upvotes: 0
Reputation: 72746
Here's a way completely avoiding pipes, just using tr
and the shell's way to give the length of a variable with ${#variable}
:
$ cat file
123 sdf
231 (3)
huh? 564
242 wr =!
$ NUMBERS=$(tr -dc '[:digit:]' < file)
$ LETTERS=$(tr -dc '[:alpha:]' < file)
$ ALNUM=$(tr -dc '[:alnum:]' < file)
$ echo ${#NUMBERS} ${#LETTERS} ${#ALNUM}
13 8 21
Upvotes: 2
Reputation: 84642
There are a number of ways to approach analyzing the line, word, and character frequency of a text file in bash. Utilizing the bash builtin character case filters (e.g. [:upper:]
, and so on), you can drill down to the frequency of each occurrence of each character type in a text file. Below is a simple script that reads from stdin
and provides the normal wc
output as it first line of output, and then outputs the number of upper
, lower
, digits
, punct
and whitespace
.
#!/bin/bash
declare -i lines=0
declare -i words=0
declare -i chars=0
declare -i upper=0
declare -i lower=0
declare -i digit=0
declare -i punct=0
oifs="$IFS"
# Read line with new IFS, preserve whitespace
while IFS=$'\n' read -r line; do
# parse line into words with original IFS
IFS=$oifs
set -- $line
IFS=$'\n'
# Add up lines, words, chars, upper, lower, digit
lines=$((lines + 1))
words=$((words + $#))
chars=$((chars + ${#line} + 1))
for ((i = 0; i < ${#line}; i++)); do
[[ ${line:$((i)):1} =~ [[:upper:]] ]] && ((upper++))
[[ ${line:$((i)):1} =~ [[:lower:]] ]] && ((lower++))
[[ ${line:$((i)):1} =~ [[:digit:]] ]] && ((digit++))
[[ ${line:$((i)):1} =~ [[:punct:]] ]] && ((punct++))
done
done
echo " $lines $words $chars $file"
echo " upper: $upper, lower: $lower, digit: $digit, punct: $punct, \
whitespace: $((chars-upper-lower-digit-punct))"
Test Input
$ cat dat/captnjackn.txt
This is a tale
Of Captain Jack Sparrow
A Pirate So Brave
On the Seven Seas.
(along with 2357 other pirates)
Example Use/Output
$ bash wcount3.sh <dat/captnjackn.txt
5 21 108
upper: 12, lower: 68, digit: 4, punct: 3, whitespace: 21
You can customize the script to give you as little or as much detail as you like. Let me know if you have any questions.
Upvotes: 0
Reputation: 649
You can use sed to replace all characters that are not of the kind that you are looking for and then word count the characters of the result.
# 1h;1!H will place all lines into the buffer that way you can replace
# newline characters
sed -n '1h;1!H;${;g;s/[^a-zA-Z]//g;p;}' myfile | wc -c
It's easy enough to just do numbers as well.
sed -n '1h;1!H;${;g;s/[^0-9]//g;p;}' myfile | wc -c
Or why not both.
sed -n '1h;1!H;${;g;s/[^0-9a-zA-Z]//g;p;}' myfile | wc -c
Upvotes: 0