aL3xa
aL3xa

Reputation: 36080

Count occurrences after a string match with bash text utilities

I'm trying to reorganise my desktop with some cool conky config. Since I'm a very big fan of org-mode in Emacs, I'd like to pipe out the tasks from org file where I keep my daily tasks and display it in conky. Say I have a file like this:

* first item
** subitem
** subitem
** subitem
* second item
** subitem
** subitem
* third item
** subitem
** subitem
** subitem
** subitem

I'd like to create a summary of my tasks that will check all tasks beginning with * and count ** items before it. Then I'd like to present that in a suitable maner:

* first item [3]
* second item [2]
* third item [4]

While I can find occurrences of strings beginning with only one * with grep:

grep "^\\* " foo.org

and I can count occurrences of ** with:

grep -c "^\\*\{2\}" foo.org

How can I achieve the desired result? Of course, one can use Python, or Ruby, but I'd like to stick with bash utilities only.

Upvotes: 5

Views: 1097

Answers (2)

ripat
ripat

Reputation: 3236

On the sample file you gave:

awk '!/^*/{next}$1=="*"{sub("\\*+ ","");p=$0}$1="**"{c[p]+=1}END{for(i in c) printf "* %s [%s]\n", i ,c[i]-1}'

That returns the desired output.

* second item [2]
* first item [3]
* third item [4]

If you need it sorted, pipe the result in sort

awk command | sort -k2,2

Upvotes: 2

PhilR
PhilR

Reputation: 5592

It wouldn't be my first choice, but you can do this in pure bash (no forks):

#!/bin/bash

set -u
unset HEADING LINE COUNT
COUNT=0
while read LINE; do
  if [[ "$LINE" =~ '^\* ' ]]; then
    #print previous, if it exists
    declare -p HEADING > /dev/null 2>&1 && echo "$HEADING [${COUNT}]"

    HEADING=$LINE
    COUNT=0
  elif [[ "$LINE" =~ '^\*\* ' ]]; then
    let COUNT++
  else
    echo "Unexpected input" 1>&2
  fi
done
echo "$HEADING [${COUNT}]"

Things to point out:

  • [[ ... =~ ... ]] is a bash extension allowing regex matches
  • declare -p is used to test for variable existance
  • The script will do funny things if the input isn't as described, e.g. empty lines, lines without the * or ** prefix

Upvotes: 1

Related Questions