powerrox
powerrox

Reputation: 1342

Count number of lines under each header in a text file using bash shell script

I can do this easily in python or some other high level language. What I am interested in is doing this with bash.

Here is the file format:

head-xyz
item1
item2
item3
head-abc
item8
item5
item6
item9

What I would like to do is print the following output:

head-xyz: 3
head-abc: 4

header will have a specific pattern similar to the example i gave above. items also have specific patterns like in the example above. I am only interested in the count of items under each header.

Upvotes: 1

Views: 1025

Answers (3)

hek2mgl
hek2mgl

Reputation: 158160

You can use awk:

awk '/head/{h=$0}{c[h]++}END{for(i in c)print i, c[i]-1}' input.file

Breakdown:

  • /head/{h=$0}

    For every line matching /head/, set variable h to record the header.

  • {c[h]++}

    For every line in the file, update the array c, which stores a map from header string to line count.

  • END{for(i in c)print i, c[i]-1}

    At the end, loop through the keys in array c and print the key (header) followed by the value (count). Subtract one to avoid counting the header itself.

Upvotes: 5

Jonathan Ross
Jonathan Ross

Reputation: 550

If you don't consider sed a high-level language, here's another approach:

for file in head-*; do
    echo "$file: \c"
    sed -n '/^head-/,${
        /^head-/d
        /^item[0-9]/!q
        p
    }
    ' <$file | wc -l
done

In English, the sed script does

  • Don't print by default
  • Within lines matching /^head-/ to end of file
    • Delete the "head line"
    • After that, quit if you find a non-item line
    • Otherwise, print the line

And wc -l to count lines.

Upvotes: 0

Michal Gasek
Michal Gasek

Reputation: 6423

Note: Bash version 4 only (uses associative arrays)

#!/usr/bin/env bash

FILENAME="$1"
declare -A CNT

while read -r LINE || [[ -n $LINE ]]
do
    if [[ $LINE =~ ^head ]]; then HEADLINE="$LINE"; fi
    if [ ${CNT[$HEADLINE]+_} ];
    then
        CNT[$HEADLINE]=$(( ${CNT[$HEADLINE]} + 1 ))
    else
        CNT[$HEADLINE]=0
    fi
done < "$FILENAME"

for i in "${!CNT[@]}"; do echo "$i: ${CNT[$i]}"; done

Output:

$ bash countitems.sh input
head-abc: 4
head-xyz: 3

Does this answer your question @powerrox ?

Upvotes: 3

Related Questions