Rio
Rio

Reputation: 14892

grep all characters including newline

I'm parsing an XML file with

"lalala it's a Sunday {{ Some words here, maybe
a new line }} oh boy"

How would I use grep to get everything within "{{" and "}}" given that the grep . character doesn't recognize newlines?

Currently I have

grep '{{.*}}'

but it only works on things that are on the same line.

Upvotes: 8

Views: 21767

Answers (4)

Peter K
Peter K

Reputation: 1807

This worked for me:

grep -zo '[[:cntrl:][:print:]]'

Upvotes: 0

Yuri Barbashov
Yuri Barbashov

Reputation: 909

This is the way i solved that problem

   grep '{{[\s\S]*}}'

Upvotes: 2

Jesse Cohen
Jesse Cohen

Reputation: 4040

One option is to remove the newline and then grep, as in:

 cat myfile | tr -d '\n' | grep {{.*}}

But if you say this is an XML file, why not use an XML parser that takes advantage of the file's inherent structure rather than just regexp?

EDIT

Grep regexp are greedy, you can use perl regexp:

cat myfile | tr -d '\n' | perl -pe 's/.*?({{.*?}})/\1\n/g' | grep {{

This should output one match per line. If you have nested {{ then this will get even more complicated.

Upvotes: 8

Phrogz
Phrogz

Reputation: 303361

You can use alternation between mutually exclusive character sets to match truly any character. For example, this command:

grep -E "\{\{([[:digit:]]|[^[:digit:]])+\}\}"

...will match anything (greedily) between the first {{ and last }}.

But as @JesseCohen states, you really, really, really should be parsing XML with an XML parser, not regexps.

Upvotes: 1

Related Questions