carrotcakeslayer
carrotcakeslayer

Reputation: 1008

get chunk of a list in bash

I need to divide in 3, a list similar to this with over 3000 lines. I need to make the division in such a way where I can specify somthing like:

  1. chunk -> from words starting with "a" until words starting with "e" (including all words that start with letter "e").
  2. chunk -> from words starting with "f", until words starting with "mj" (including all words that start with "mj").
  3. chunk -> from words starting with "mk", until words starting with "z".

Example input:

about
block
echo
far
maps
mjalgo
mjprou
mksomething
november
opshacom
oscar
softball
zorro

Any ideas how to achieve this? I don't need one command to do it all, I just need to know how to write 1 command per chunk needed.

Thanks!

Upvotes: 1

Views: 472

Answers (3)

ctac_
ctac_

Reputation: 2471

You can try with csplit

csplit infile /^f/ /^mk/

Upvotes: 0

dawg
dawg

Reputation: 103814

With a range type regex, like /^c/ -- /^dd/ you can use sed on a sorted file:

$ sed -nE '/^c/,/^dd/p' file.txt
[email protected]
[email protected]
[email protected]

Or perl:

$ perl -ne 'print if /^c/ .. /^dd/' file.txt
[email protected]
[email protected]
[email protected]

Or awk:

$ awk '/^c/,/^dd/' file.txt
[email protected]
[email protected]
[email protected]

Based on the new post:

If you wish to group by different regex matches, awk is your best bet (or multiple runs of sed grep etc)

Example:

$ cat file.txt
about
block
echo
far
maps
mjalgo
mjprou
mksomething
november
opshacom
oscar
softball
zorro

You can do:

$ awk '/^[a-e]/               {print $0>"f1.txt"; next}
     /^[f-k]/ || /^m[a-j]/    {print $0>"f2.txt"; next}
     /^m[k-z]/ || /^[n-z]/    {print $0>"f3.txt"; next}
     ' file.txt

Then you have your 3 buckets in 3 different files:

for fn in f{1..3}.txt; do
    sort "$fn"
    echo "==="
done   

Prints:

about
block
blood
echo
===
maps
mjalgo
mjprou
===
mksomething
november
opshacom
oscar
softball
zorro
===

If the input is sorted the sorting of each file is not necessary. If you have gawk vs POSIX awk, you can sort the lines internally.

Upvotes: 2

James Brown
James Brown

Reputation: 37404

$ awk '$0>="c" && $0<"dd"' file
[email protected]
[email protected]

Upvotes: 1

Related Questions