octosquidopus
octosquidopus

Reputation: 3713

How to sort groups of lines?

In the following example, there are 3 elements that have to be sorted:

  1. "[aaa]" and the 4 lines (always 4) below it form a single unit.
  2. "[kkk]" and the 4 lines (always 4) below it form a single unit.
  3. "[zzz]" and the 4 lines (always 4) below it form a single unit.

Only groups of lines following this pattern should be sorted; anything before "[aaa]" and after the 4th line of "[zzz]" must be left intact.

from:

This sentence and everything above it should not be sorted.

[zzz]
some
random
text
here
[aaa]
bla
blo
blu
bli
[kkk]
1
44
2
88

And neither should this one and everything below it.

to:

This sentence and everything above it should not be sorted.

[aaa]
bla
blo
blu
bli
[kkk]
1
44
2
88
[zzz]
some
random
text
here

And neither should this one and everything below it.

Upvotes: 3

Views: 2846

Answers (3)

potong
potong

Reputation: 58381

This might work for you (GNU sed & sort):

sed -i.bak '/^\[/!b;N;N;N;N;s/\n/UnIqUeStRiNg/g;w sort_file' file
sort -o sort_file sort_file
sed -i -e '/^\[/!b;R sort_file' -e 'd' file
sed -i 's/UnIqUeStRiNg/\n/g' file

Sorted file will be in file and original file in file.bak.

This will present all lines beginning with [ and following 4 lines, in sorted order.

UnIqUeStRiNg can be any unique string not containing a newline, e.g. \x00

Upvotes: 0

anishsane
anishsane

Reputation: 20980

Assuming that other lines do not contain a [ in them:

header=`grep -n 'This sentence and everything above it should not be sorted.' sortme.txt | cut -d: -f1`
footer=`grep -n 'And neither should this one and everything below it.' sortme.txt | cut -d: -f1`

head -n $header sortme.txt #print header

head -n $(( footer - 1 )) sortme.txt | tail -n +$(( header + 1 )) | tr '\n[' '[\n' | sort | tr '\n[' '[\n' | grep -v '^\[$' #sort lines between header & footer
#cat sortme.txt | head -n $(( footer - 1 )) | tail -n +$(( header + 1 )) | tr '\n[' '[\n' | sort | tr '\n[' '[\n' | grep -v '^\[$' #sort lines between header & footer

tail -n +$footer sortme.txt #print footer

Serves the purpose.

Note that the main sort work is done by 4th command only. Other lines are to reserve header & footer.

I am also assuming that, between header & first "[section]" there are no other lines.

Upvotes: 1

rici
rici

Reputation: 241701

Maybe not the fastest :) [1] but it will do what you want, I believe:

for line in $(grep -n '^\[.*\]$' sections.txt |
              sort -k2 -t: |
              cut -f1 -d:); do
  tail -n +$line sections.txt | head -n 5
done

Here's a better one:

for pos in $(grep -b '^\[.*\]$' sections.txt |
             sort -k2 -t: |
             cut -f1 -d:); do
  tail -c +$((pos+1)) sections.txt | head -n 5
done

[1] The first one is something like O(N^2) in the number of lines in the file, since it has to read all the way to the section for each section. The second one, which can seek immediately to the right character position, should be closer to O(N log N).

[2] This takes you at your word that there are always exactly five lines in each section (header plus four following), hence head -n 5. However, it would be really easy to replace that with something which read up to but not including the next line starting with a '[', in case that ever turns out to be necessary.


Preserving start and end requires a bit more work:

# Find all the sections
mapfile indices < <(grep -b '^\[.*\]$' sections.txt)
# Output the prefix
head -c+${indices[0]%%:*} sections.txt
# Output sections, as above
for pos in $(printf %s "${indices[@]}" |
             sort -k2 -t: |
             cut -f1 -d:); do
  tail -c +$((pos+1)) sections.txt | head -n 5
done
# Output the suffix
tail -c+$((1+${indices[-1]%%:*})) sections.txt | tail -n+6

You might want to make a function out of that, or a script file, changing sections.txt to $1 throughout.

Upvotes: 1

Related Questions