Reputation: 1750

Print file up to Nth match

I'm trying to split a file up. sed can be used to do this, for example

sed -e '0,/expr/d' filename

would give the bottom half of the file after "expr" But what if there is more than one occurrence and I want to split after the nth occurrence? I figured out if I want it after the second occurrence then

sed -e '0,/expr/! {/expr/,$d}' filename

gives the top half of the file up to the second match of "expr". The exclamation point (!) tells it to ignore the first range and only apply the commands in the braces to the other parts of the file.

But what about more general cases? For example, from the second last occurrence.

I've been using sed here, but I think awk would have elegant solutions too.

Upvotes: 4

Answers (3)

potong

Reputation: 58420

This might work for you (GNU sed):

sed -nr 'x;/^X{2}/{x;p;b};x;/REGEXP/{x;s/^/X/;x}' file

This will print out anything after the 2nd match of REGEXP.

N.B.The REGEXP may occur one or more times per line but will only be counted once.

Upvotes: 0

jkshah

Reputation: 11703

Some more variations of awk in addition to @rici's solutions

Up to and including the $nth match:

awk -v n=$n 'p<n; /regex/{p++}' file
Up to but not including the $nth match:

awk -v n=$n '/regex/{p++} p<n' file
From and including $nth match

awk -v n=$n '/regex/{p++} p>=n' file
From and not including $nth match

awk -v n=$n 'p>=n; /regex/{p++}' file

But what about more general cases? For example, from the second last occurrence.

In that case simple approach would be to read file reverse with tac, do above options and print it again in reverse.

From and including $nth last match

tac file | awk -v n=$n 'p<n; /regex/{p++}' | tac
From and not including $nthe last match

tac file | awk -v n=$n '/regex/{p++} p<n' | tac
Up to and including $nth last match

tac file | awk -v n=$n '/regex/{p++} p>=n' | tac
Up to and not including $nth last match

tac file | awk -v n=$n 'p>=n; /regex/{p++}' | tac

Note for OS X users as pointed out by @mklement0 in comments

Poor [stock] OS X users (as of OS X 10.9) are out of luck: no tac there.
on OS X you can use tail -r (note that tail on Linux appears not to support -r).

Upvotes: 2

rici

Reputation: 241721

Simple awk solutions:

Up to and including the $nth match of /regex/:

awk -vn=$n '{print}/regex/&&!--n{exit}'
Up to but not including the $nth match:

awk -vn=$n '/regex/&&!--n{exit}{print}'

In both the above programscases, setting n to 0 will print the whole file. Also, both uses of {print} can be changed to 1; because the default action is {print}. (Or just 1 in the second program.)

For completeness:
Everything after the $nth match:

awk -vn=$n 'n<=0;/regex/{--n}'

Note: As pointed out in a comment by @mklement0, there is a bug in command-line option parsing in versions of BSD Awk (aka "one-true-awk", the version written and as far as I know still maintained by Brian Kernighan) prior to May 23, 2010; this apparently includes the version distributed with Mac OS X (as of v10.9). As a result, if you use one of these awk versions, you need to write -v n=$n instead of -vn=$n.

Upvotes: 2

Print file up to Nth match

Answers (3)

Related Questions