trilloyd
trilloyd

Reputation: 101

Match only the first paragraph using bash

We have

...a file containing paragraphs, splitted by 2 newlines \r\n\r\n or \n\n. The paraghraphs themselves may contain single newlines \r\n or \n. The goal is to use a Bash one-liner to match only the first paragraph and to print it to stdout.

E.G.:

$ cat foo.txt
Foo
* Bar

Baz
* Foobar

Even more stuff to match here.

results in:

$ cat foo.txt | <some-command>
Foo
* Bar

I've already tried

...this regex (?s)(.+?)(\r?\n){2}|.+?$ with grep using

The first two approaches resulted in:

$ grep -Poz '(?s)(.+?)(\r?\n){2}|.+?$' foo.txt
Foo                                                                                                                          
* Bar

Baz                                                                                                                          
* Foobar

The approach on Mac failed, due to differences between BSD grep and GNU grep.

But

... on regex101.com this regex works on foo.txt: https://regex101.com/r/uoej8O/1. This may be due to disabling the global flag?

Upvotes: 8

Views: 626

Answers (5)

James Brown
James Brown

Reputation: 37424

For GNU awk if the paragraphs are separated by \r\n\r\n or \n\n:

$ awk -v RS="\r?\n\r?\n" '{print $0;exit}' file

Output:

Foo
* Bar

Upvotes: 5

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627087

You can use a GNU grep like this:

grep -Poz '(?s)^.+?(?=\R{2}|$)' file

See the PCRE regex demo.

Details

  • (?s) - a DOTALL inline modifier that makes . match all chars including linebreak chars
  • ^ - start of the whole string
  • .+? - any 1 or more chars, as few as possible
  • (?=\R{2}|$) - a positive lookahead that matches a location immediately followed with a double line break sequence (\R{2}) or end of string ($).

Upvotes: 4

anubhava
anubhava

Reputation: 785481

This is a tailor-made problem for gnu awk by using a custom record separator. We can use a custom RS that breaks file data by 2 or more of an optional \r followed by \n:

awk -v RS='(\r?\n){2,}' 'NR == 1' file

This outputs:

Foo
* Bar

If you want awk to be more efficient when input is very big:

awk -v RS='(\r?\n){2,}' '{print; exit}' file

Upvotes: 8

potong
potong

Reputation: 58473

This might work for you (GNU sed):

sed 'N;P;/\n\r\?$/Q;D' file

Open a two line window, print the first of these lines and if the window contains a newline (with an optional return) at the end of a line, quit processing (without printing anything else).

Upvotes: 0

rethab
rethab

Reputation: 8433

If you only want the first paragraph and the paragraphs are separated by a newline, then this might work:

awk '!NF{ exit } 1' foo.txt

Upvotes: 0

Related Questions