surfearth
surfearth

Reputation: 3147

Match inclusive between start and end over multiple lines

I have a large amount of Toml files, some of which contain a parameter I would like to remove. I'm having difficulty building a regex that matches the starting text categories = and ending text ]. Per the sample below, my regex matches the text in between the start and end text, but does not include the start and end text itself. How do I modify the regex to capture everything between the start and end text?

My current regex is: (?<=categories)(.*)(?=])

The sample .toml contains:

+++
slug = "twenty-years-from-now-you-will-be-more"
description = ""
tags = [
  "Quoteoftheday",
  "Quote",
]
categories = [
  "Quoteoftheday",
  "Quote",
]
date = 2014-01-16T07:13:10-08:00
title = "twenty years from now..."
draft = false

+++

The text I want to capture with the regex is:

categories = [
  "Quoteoftheday",
  "Quote",
]

Sample code is here.

Upvotes: 1

Views: 1572

Answers (2)

anubhava
anubhava

Reputation: 784958

Using negated character class you can make it work without DOTALL or s flag so that it can work with flavors where DOTALL isn't supported like Javascript.

\ncategories([^]]*)\]

RegEx Demo


To make it work with sed use this command:

sed -i.bak '/^categories[ \t]*=/,/\]/d' file

cat file

+++
slug = "twenty-years-from-now-you-will-be-more"
description = ""
tags = [
  "Quoteoftheday",
  "Quote",
]
date = 2014-01-16T07:13:10-08:00
title = "twenty years from now..."
draft = false

+++

Upvotes: 1

m87
m87

Reputation: 4523

Try using the following regex :

(?s)categories[\s=\[]+(.*?)]

Explanation

  • (?s) single line flag/modifier
  • categories[\s=\[]+ match 'categories' and any instances of space = [
  • (.*?)] match any character and ]

DEMO

Upvotes: 1

Related Questions