Samuel Tan
Samuel Tan

Reputation: 1750

Print file up to Nth match

I'm trying to split a file up. sed can be used to do this, for example

sed -e '0,/expr/d' filename

would give the bottom half of the file after "expr" But what if there is more than one occurrence and I want to split after the nth occurrence? I figured out if I want it after the second occurrence then

sed -e '0,/expr/! {/expr/,$d}' filename

gives the top half of the file up to the second match of "expr". The exclamation point (!) tells it to ignore the first range and only apply the commands in the braces to the other parts of the file.

But what about more general cases? For example, from the second last occurrence.

I've been using sed here, but I think awk would have elegant solutions too.

Upvotes: 4

Views: 417

Answers (3)

potong
potong

Reputation: 58420

This might work for you (GNU sed):

sed -nr 'x;/^X{2}/{x;p;b};x;/REGEXP/{x;s/^/X/;x}' file

This will print out anything after the 2nd match of REGEXP.

N.B.The REGEXP may occur one or more times per line but will only be counted once.

Upvotes: 0

jkshah
jkshah

Reputation: 11703

Some more variations of awk in addition to @rici's solutions

  1. Up to and including the $nth match:

    awk -v n=$n 'p<n; /regex/{p++}' file

  2. Up to but not including the $nth match:

    awk -v n=$n '/regex/{p++} p<n' file

  3. From and including $nth match

    awk -v n=$n '/regex/{p++} p>=n' file

  4. From and not including $nth match

    awk -v n=$n 'p>=n; /regex/{p++}' file


But what about more general cases? For example, from the second last occurrence.

In that case simple approach would be to read file reverse with tac, do above options and print it again in reverse.

  1. From and including $nth last match

    tac file | awk -v n=$n 'p<n; /regex/{p++}' | tac

  2. From and not including $nthe last match

    tac file | awk -v n=$n '/regex/{p++} p<n' | tac

  3. Up to and including $nth last match

    tac file | awk -v n=$n '/regex/{p++} p>=n' | tac

  4. Up to and not including $nth last match

    tac file | awk -v n=$n 'p>=n; /regex/{p++}' | tac


Note for OS X users as pointed out by @mklement0 in comments

  • Poor [stock] OS X users (as of OS X 10.9) are out of luck: no tac there.

  • on OS X you can use tail -r (note that tail on Linux appears not to support -r).

Upvotes: 2

rici
rici

Reputation: 241721

Simple awk solutions:

  1. Up to and including the $nth match of /regex/:

    awk -vn=$n '{print}/regex/&&!--n{exit}'

  2. Up to but not including the $nth match:

    awk -vn=$n '/regex/&&!--n{exit}{print}'

    In both the above programscases, setting n to 0 will print the whole file. Also, both uses of {print} can be changed to 1; because the default action is {print}. (Or just 1 in the second program.)

    For completeness:

  3. Everything after the $nth match:

    awk -vn=$n 'n<=0;/regex/{--n}'

Note: As pointed out in a comment by @mklement0, there is a bug in command-line option parsing in versions of BSD Awk (aka "one-true-awk", the version written and as far as I know still maintained by Brian Kernighan) prior to May 23, 2010; this apparently includes the version distributed with Mac OS X (as of v10.9). As a result, if you use one of these awk versions, you need to write -v n=$n instead of -vn=$n.

Upvotes: 2

Related Questions