K.Dote
K.Dote

Reputation: 31

How to count the number of numbers/letters in file?

I try to count the number of numbers and letters in my file in Bash. I know that I can use wc -c file to count the number of characters but how can I fix it to only letters and secondly numbers?

Upvotes: 3

Views: 5930

Answers (5)

dnit13
dnit13

Reputation: 2496

To count the number of letters and numbers you can combine grep with wc:

 grep -Eo '[a-z]' myfile | wc -w
 grep -Eo '[0-9]' myfile | wc -w

With little bit of tweaking you can modify it to count numbers or alphabetic words or alphanumeric words like this,

grep -Eo '[a-z]+' myfile | wc -w
grep -Eo '[0-9]+' myfile | wc -w
grep -Eo '[[:alnum:]]+' myfile | wc -w

Upvotes: 1

Mureinik
Mureinik

Reputation: 312219

You can use tr to preserve only alphanumeric characters by combining the the -c (complement) and -d (delete) flags. From there on, it's just a question of some piping:

$ cat myfile.txr | tr -cd [:alnum:] | wc -c

Upvotes: 0

Jens
Jens

Reputation: 72746

Here's a way completely avoiding pipes, just using tr and the shell's way to give the length of a variable with ${#variable}:

$ cat file
123 sdf
231 (3)
huh? 564
242 wr =!
$ NUMBERS=$(tr -dc '[:digit:]' < file)
$ LETTERS=$(tr -dc '[:alpha:]' < file)
$ ALNUM=$(tr -dc '[:alnum:]' < file)
$ echo ${#NUMBERS} ${#LETTERS} ${#ALNUM}
13 8 21

Upvotes: 2

David C. Rankin
David C. Rankin

Reputation: 84642

There are a number of ways to approach analyzing the line, word, and character frequency of a text file in bash. Utilizing the bash builtin character case filters (e.g. [:upper:], and so on), you can drill down to the frequency of each occurrence of each character type in a text file. Below is a simple script that reads from stdin and provides the normal wc output as it first line of output, and then outputs the number of upper, lower, digits, punct and whitespace.

#!/bin/bash

declare -i lines=0
declare -i words=0
declare -i chars=0
declare -i upper=0
declare -i lower=0
declare -i digit=0
declare -i punct=0

oifs="$IFS"

# Read line with new IFS, preserve whitespace
while IFS=$'\n' read -r line; do

    # parse line into words with original IFS
    IFS=$oifs
    set -- $line
    IFS=$'\n'

    # Add up lines, words, chars, upper, lower, digit
    lines=$((lines + 1))
    words=$((words + $#))
    chars=$((chars + ${#line} + 1))
    for ((i = 0; i < ${#line}; i++)); do
        [[ ${line:$((i)):1} =~ [[:upper:]] ]] && ((upper++))
        [[ ${line:$((i)):1} =~ [[:lower:]] ]] && ((lower++))
        [[ ${line:$((i)):1} =~ [[:digit:]] ]] && ((digit++))
        [[ ${line:$((i)):1} =~ [[:punct:]] ]] && ((punct++))
    done
done

echo " $lines $words $chars $file"
echo " upper: $upper,  lower: $lower,  digit: $digit,  punct: $punct,  \
whitespace: $((chars-upper-lower-digit-punct))"

Test Input

$ cat dat/captnjackn.txt
This is a tale
Of Captain Jack Sparrow
A Pirate So Brave
On the Seven Seas.
(along with 2357 other pirates)

Example Use/Output

$ bash wcount3.sh <dat/captnjackn.txt
 5 21 108
 upper: 12,  lower: 68,  digit: 4,  punct: 3,  whitespace: 21

You can customize the script to give you as little or as much detail as you like. Let me know if you have any questions.

Upvotes: 0

Saqib Rokadia
Saqib Rokadia

Reputation: 649

You can use sed to replace all characters that are not of the kind that you are looking for and then word count the characters of the result.

# 1h;1!H will place all lines into the buffer that way you can replace
# newline characters
sed -n '1h;1!H;${;g;s/[^a-zA-Z]//g;p;}' myfile | wc -c

It's easy enough to just do numbers as well.
sed -n '1h;1!H;${;g;s/[^0-9]//g;p;}' myfile | wc -c

Or why not both.
sed -n '1h;1!H;${;g;s/[^0-9a-zA-Z]//g;p;}' myfile | wc -c

Upvotes: 0

Related Questions