0xbadf00d
0xbadf00d

Reputation: 18218

Regex Match all characters between two strings

Example: This is just\na simple sentence.

I want to match every character between This is and sentence. Line breaks should be ignored. I can't figure out the correct syntax.

Upvotes: 723

Views: 1434392

Answers (18)

Cary Swoveland
Cary Swoveland

Reputation: 110735

Rather than extract the bits you want you could replace the bits you don't want with empty strings.

In Ruby,

"This is just\na simple sentence".gsub(/^This is|sentence\z/, '')
  #=> " just\na simple "

Upvotes: 1

MCheng
MCheng

Reputation: 1210

For python

def match_between_strings(text, start_str, end_str):
    pattern = re.escape(start_str) + r'(.*?)' + re.escape(end_str)
    matches = re.findall(pattern, text, re.DOTALL)
    return matches

Example usage:

start_str = "This"
end_str = "sentence"
text = "This is just\na simple sentence"

result = match_between_strings(text, start_str, end_str)

Result

[' is just\na simple ']

Upvotes: 1

lamoboos223
lamoboos223

Reputation: 41

i had this string

      headers:
        Date:
          schema:
            type: string
            example: Tue, 23 Aug 2022 11:36:23 GMT
        Content-Type:
          schema:
            type: string
            example: application/json; charset=utf-8
        Transfer-Encoding:
          schema:
            type: string
            example: chunked
        Connection:
          schema:
            type: string
            example: keep-alive
        Content-Encoding:
          schema:
            type: string
            example: gzip
        Vary:
          schema:
            type: string
            example: Accept-Encoding
        Server:
          schema:
            type: number
            example: Microsoft-IIS/10.0
        X-Powered-By:
          schema:
            type: string
            example: ASP.NET
        Access-Control-Allow-Origin:
          schema:
            type: string
            example: '*'
        Access-Control-Allow-Credentials:
          schema:
            type: boolean
            example: 'true'
        Access-Control-Allow-Headers:
          schema:
            type: string
            example: '*'
        Access-Control-Max-Age:
          schema:
            type: string
            example: '-1'
        Access-Control-Allow-Methods:
          schema:
            type: string
            example: GET, PUT, POST, DELETE
        X-Content-Type-Options:
          schema:
            type: string
            example: nosniff
        X-XSS-Protection:
          schema:
            type: string
            example: 1; mode=block
      content:
        application/json:

and i wanted to remove everything from the words headers: to content so I wrote this regex (headers:)[^]*?(content)

and it worked as expected finding how many times that expression has occurred.

Upvotes: 0

The fourth bird
The fourth bird

Reputation: 163557

In case of JavaScript you can use [^] to match any character including newlines.

Using the /s flag with a dot . to match any character also works, but is applied to the whole pattern and JavaScript does not support inline modifiers to turn on/off the flag.

To match as least as possible characters, you can make the quantifier non greedy by appending a question mark, and use a capture group to extract the part in between.

This is([^]*?)sentence

See a regex101 demo.

As a side note, to not match partial words you can use word boundaries like \bThis and sentence\b

const s = "This is just\na simple sentence";
const regex = /This is([^]*?)sentence/;
const m = s.match(regex);

if (m) {
  console.log(m[1]);
}


The lookaround variant in JavaScript is (?<=This is)[^]*?(?=sentence) and you could check Lookbehind in JS regular expressions for the support.

Also see Important Notes About Lookbehind.

const s = "This is just\na simple sentence";
const regex = /(?<=This is)[^]*?(?=sentence)/;
const m = s.match(regex);

if (m) {
  console.log(m[0]);
}

Upvotes: 5

kaore
kaore

Reputation: 1368

Try This is[\s\S]*?sentence, works in javascript

Upvotes: 83

Yahya Hassani
Yahya Hassani

Reputation: 46

There is a way to deal with repeated instances of this split in a block of text? FOr instance: "This is just\na simple sentence. Here is some additional stuff. This is just\na simple sentence. And here is some more stuff. This is just\na simple sentence. ". to matches each instance instead of the entire string, use below code:

data = "This is just\na simple sentence. Here is some additional stuff. This is just\na simple sentence. And here is some more stuff. This is just\na simple sentence."

pattern = re.compile('This is (?s).*? sentence')

for match_instance in re.finditer(pattern, data):
    do_something(match_instance.group())

Upvotes: 2

Alexander Golovinov
Alexander Golovinov

Reputation: 533

RegEx to match everything between two strings using the Java approach.

List<String> results = new ArrayList<>(); //For storing results
String example = "Code will save the world";

Let's use Pattern and Matcher objects to use RegEx (.?)*.

Pattern p = Pattern.compile("Code "(.*?)" world");   //java.util.regex.Pattern;
Matcher m = p.matcher(example);                      //java.util.regex.Matcher;

Since Matcher might contain more than one match, we need to loop over the results and store it.

while(m.find()){   //Loop through all matches
   results.add(m.group()); //Get value and store in collection.
}

This example will contain only "will save the" word, but in the bigger text it will probably find more matches.

Upvotes: 3

Roshna Omer
Roshna Omer

Reputation: 721

This worked for me (I'm using VS Code):

for: This is just\na simple sentence

Use: This .+ sentence

Upvotes: 5

alchemy
alchemy

Reputation: 982

I landed here on my search for regex to convert this print syntax between print "string", in Python2 in old scripts with: print("string"), for Python3. Works well, otherwise use 2to3.py for additional conversions. Here is my solution for others:

Try it out on Regexr.com (doesn't work in NP++ for some reason):

find:     (?<=print)( ')(.*)(')
replace: ('$2')

for variables:

(?<=print)( )(.*)(\n)
('$2')\n

for label and variable:

(?<=print)( ')(.*)(',)(.*)(\n)
('$2',$4)\n

How to replace all print "string" in Python2 with print("string") for Python3?

Upvotes: 1

vins
vins

Reputation: 69

for a quick search in VIM, you could use at Vim Control prompt: /This is.*\_.*sentence

Upvotes: 0

Bbb
Bbb

Reputation: 649

Here is how I did it:
This was easier for me than trying to figure out the specific regex necessary.

int indexPictureData = result.IndexOf("-PictureData:");
int indexIdentity = result.IndexOf("-Identity:");
string returnValue = result.Remove(indexPictureData + 13);
returnValue = returnValue + " [bytecoderemoved] " + result.Remove(0, indexIdentity); ` 

Upvotes: 1

rsc05
rsc05

Reputation: 3820

Sublime Text 3x

In sublime text, you simply write the two word you are interested in keeping for example in your case it is

"This is" and "sentence"

and you write .* in between

i.e. This is .* sentence

and this should do you well

Upvotes: -1

Cephos
Cephos

Reputation: 21

In case anyone is looking for an example of this within a Jenkins context. It parses the build.log and if it finds a match it fails the build with the match.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

node{    
    stage("parse"){
        def file = readFile 'build.log'

        def regex = ~"(?s)(firstStringToUse(.*)secondStringToUse)"
        Matcher match = regex.matcher(file)
        match.find() {
            capturedText = match.group(1)
            error(capturedText)
        }
    }
}

Upvotes: 2

AnirbanDebnath
AnirbanDebnath

Reputation: 1043

You can simply use this: \This is .*? \sentence

Upvotes: 4

Riyafa Abdul Hameed
Riyafa Abdul Hameed

Reputation: 7983

This:

This is (.*?) sentence

works in javascript.

Upvotes: 29

zx81
zx81

Reputation: 41848

Lazy Quantifier Needed

Resurrecting this question because the regex in the accepted answer doesn't seem quite correct to me. Why? Because

(?<=This is)(.*)(?=sentence)

will match my first sentence. This is my second in This is my first sentence. This is my second sentence.

See demo.

You need a lazy quantifier between the two lookarounds. Adding a ? makes the star lazy.

This matches what you want:

(?<=This is).*?(?=sentence)

See demo. I removed the capture group, which was not needed.

DOTALL Mode to Match Across Line Breaks

Note that in the demo the "dot matches line breaks mode" (a.k.a.) dot-all is set (see how to turn on DOTALL in various languages). In many regex flavors, you can set it with the online modifier (?s), turning the expression into:

(?s)(?<=This is).*?(?=sentence)

Reference

Upvotes: 288

vignesh
vignesh

Reputation: 223

use this: (?<=beginningstringname)(.*\n?)(?=endstringname)

Upvotes: 19

stema
stema

Reputation: 93026

For example

(?<=This is)(.*)(?=sentence)

Regexr

I used lookbehind (?<=) and look ahead (?=) so that "This is" and "sentence" is not included in the match, but this is up to your use case, you can also simply write This is(.*)sentence.

The important thing here is that you activate the "dotall" mode of your regex engine, so that the . is matching the newline. But how you do this depends on your regex engine.

The next thing is if you use .* or .*?. The first one is greedy and will match till the last "sentence" in your string, the second one is lazy and will match till the next "sentence" in your string.

Update

Regexr

This is(?s)(.*)sentence

Where the (?s) turns on the dotall modifier, making the . matching the newline characters.

Update 2:

(?<=is \()(.*?)(?=\s*\))

is matching your example "This is (a simple) sentence". See here on Regexr

Upvotes: 1068

Related Questions