Reputation: 15685
I have a file containing list and sublist and I want to extract the longest sublist using command line tools.
File example:
* Item1
** SubItem1
** ...
** SubItemN
* Item2
** SubItem1
** ...
** SubItemN
* ...
** ...
* ItemN
** SubItem1
** ...
** SubItemN
I am trying to know if this can be done easily, otherwise I will write a Perl script.
Upvotes: 0
Views: 197
Reputation: 85865
$ cat file
* letters
** a
** b
** b
** d
** e
** f
* colors
** red
** green
** blue
* numbers
** 1
** 2
** 3
** 4
** 5
Show length of each sublist by reversing file with tac
and using awk
:
$ tac file | awk '/^\*\*/{c++}/^\*[^*]/{print c,$2;c=0}'
5 numbers
3 colors
6 letters
Print length of largest sublist only:
$ tac file | awk '/^\*\*/{c++}/^\*[^*]/{if(c>m){m=c;l=$2}c=0}END{print m,l}'
6 letters
Upvotes: 1
Reputation: 247012
The Perl one-liner:
perl -00 -ne '$n=tr/\n/\n/; if ($n>$m) {$m=$n; $max=$_}; END {print $max}' file
Just using bash:
max=0
while read bullet thingy; do
case $bullet in
"*") item=$thingy; count=0 ;;
"**") ((count++)) ;;
"") (( count > max )) && { max_item=$item; max=$count; } ;;
esac
done < <(cat file; echo)
echo $max_item $max
The <(cat file; echo)
part is to ensure that there is a blank line after the last line of the file, so that the last sublist group can be compared against the max
That only keeps the count. To save the items in the biggest sublist:
max=0
while read bullet thingy; do
case $bullet in
"*") item=$thingy; unset sublist; sublist=() ;;
"**") sublist+=($thingy) ;;
"") if (( ${#sublist[@]} > max )); then
max=${#sublist[@]}
max_item=$item
max_sublist=("${sublist[@]}")
fi
;;
esac
done < <(cat file; echo)
printf "%s\n" "$max_item" "${#max_sublist[@]}" "${max_sublist[@]}"
if using sudo_O's example, this outputs
letters
6
a
b
b
d
e
f
Upvotes: 3
Reputation: 4319
cat file.txt | grep -nE "^\*[^\*].*" | cut -d ":" -f 1,1 | tee tmp | awk 'NR==1{s=$1;next} {print $1-s;s=$1}' > tmp2
echo 0 >> tmp2
res=`paste tmp tmp2 | sort -nrk 2,2 | head -n 1`
line=`echo "$res" | cut -f 1,1`
ln=`echo "$res" | cut -f 2,2`
cat file.txt | tail -n +$line | head -n $ln
rm tmp tmp2
There is definitely a shorter solution :)
Upvotes: 0