asdasd
asdasd

Reputation: 5239

How to extract a substring using regex

I have a string that has two single quotes in it, the ' character. In between the single quotes is the data I want.

How can I write a regex to extract "the data i want" from the following text?

mydata = "some string with 'the data i want' inside";

Upvotes: 504

Views: 1076332

Answers (14)

Nikolas
Nikolas

Reputation: 44486

Since Java 9

As of this version, you can use a new method Matcher::results with no args that is able to comfortably return Stream<MatchResult> where MatchResult represents the result of a match operation and offers to read matched groups and more (this class is known since Java 1.5).

String string = "Some string with 'the data I want' inside and 'another data I want'.";

Pattern pattern = Pattern.compile("'(.*?)'");
pattern.matcher(string)
       .results()                       // Stream<MatchResult>
       .map(mr -> mr.group(1))          // Stream<String> - the 1st group of each result
       .forEach(System.out::println);   // print them out (or process in other way...)

The code snippet above results in:

the data I want
another data I want

The biggest advantage is in the ease of usage when one or more results is available compared to the procedural if (matcher.find()) and while (matcher.find()) checks and processing.

Upvotes: 21

Arindam
Arindam

Reputation: 723

Some how the group(1) didnt work for me. I used group(0) to find the url version.

Pattern urlVersionPattern = Pattern.compile("\\/v[0-9][a-z]{0,1}\\/");
Matcher m = urlVersionPattern.matcher(url);
if (m.find()) { 
    return StringUtils.substringBetween(m.group(0), "/", "/");
}
return "v0";

Upvotes: 0

Ganesh
Ganesh

Reputation: 767

add apache.commons dependency on your pom.xml

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-io</artifactId>
    <version>1.3.2</version>
</dependency>

And below code works.

StringUtils.substringBetween(String mydata, String "'", String "'")

Upvotes: 1

Noah Mohamed
Noah Mohamed

Reputation: 134

you can use this i use while loop to store all matches substring in the array if you use

if (matcher.find()) { System.out.println(matcher.group(1)); }

you will get on matches substring so you can use this to get all matches substring

Matcher m = Pattern.compile("[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]+").matcher(text);
   // Matcher  mat = pattern.matcher(text);
    ArrayList<String>matchesEmail = new ArrayList<>();
        while (m.find()){
            String s = m.group();
            if(!matchesEmail.contains(s))
                matchesEmail.add(s);
        }

    Log.d(TAG, "emails: "+matchesEmail);

Upvotes: 0

Memin
Memin

Reputation: 4090

Apache Commons Lang provides a host of helper utilities for the java.lang API, most notably String manipulation methods. In your case, the start and end substrings are the same, so just call the following function.

StringUtils.substringBetween(String str, String tag)

Gets the String that is nested in between two instances of the same String.

If the start and the end substrings are different then use the following overloaded method.

StringUtils.substringBetween(String str, String open, String close)

Gets the String that is nested in between two Strings.

If you want all instances of the matching substrings, then use,

StringUtils.substringsBetween(String str, String open, String close)

Searches a String for substrings delimited by a start and end tag, returning all matching substrings in an array.

For the example in question to get all instances of the matching substring

String[] results = StringUtils.substringsBetween(mydata, "'", "'");

Upvotes: 2

ZehnVon12
ZehnVon12

Reputation: 4176

String dataIWant = mydata.replaceFirst(".*'(.*?)'.*", "$1");

Upvotes: 10

ZehnVon12
ZehnVon12

Reputation: 4176

String dataIWant = mydata.split("'")[1];

See Live Demo

Upvotes: 2

Bohemian
Bohemian

Reputation: 425308

There's a simple one-liner for this:

String target = myData.replaceAll("[^']*(?:'(.*?)')?.*", "$1");

By making the matching group optional, this also caters for quotes not being found by returning a blank in that case.

See live demo.

Upvotes: 20

Mark Byers
Mark Byers

Reputation: 839044

Assuming you want the part between single quotes, use this regular expression with a Matcher:

"'(.*?)'"

Example:

String mydata = "some string with 'the data i want' inside";
Pattern pattern = Pattern.compile("'(.*?)'");
Matcher matcher = pattern.matcher(mydata);
if (matcher.find())
{
    System.out.println(matcher.group(1));
}

Result:

the data i want

Upvotes: 728

Beothorn
Beothorn

Reputation: 1317

You don't need regex for this.

Add apache commons lang to your project (http://commons.apache.org/proper/commons-lang/), then use:

String dataYouWant = StringUtils.substringBetween(mydata, "'");

Upvotes: 84

Daniel C. Sobral
Daniel C. Sobral

Reputation: 297265

In Scala,

val ticks = "'([^']*)'".r

ticks findFirstIn mydata match {
    case Some(ticks(inside)) => println(inside)
    case _ => println("nothing")
}

for (ticks(inside) <- ticks findAllIn mydata) println(inside) // multiple matches

val Some(ticks(inside)) = ticks findFirstIn mydata // may throw exception

val ticks = ".*'([^']*)'.*".r    
val ticks(inside) = mydata // safe, shorter, only gets the first set of ticks

Upvotes: 1

Debilski
Debilski

Reputation: 67888

Because you also ticked Scala, a solution without regex which easily deals with multiple quoted strings:

val text = "some string with 'the data i want' inside 'and even more data'"
text.split("'").zipWithIndex.filter(_._2 % 2 != 0).map(_._1)

res: Array[java.lang.String] = Array(the data i want, and even more data)

Upvotes: 11

Sean McEligot
Sean McEligot

Reputation: 317

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test {
    public static void main(String[] args) {
        Pattern pattern = Pattern.compile(".*'([^']*)'.*");
        String mydata = "some string with 'the data i want' inside";

        Matcher matcher = pattern.matcher(mydata);
        if(matcher.matches()) {
            System.out.println(matcher.group(1));
        }

    }
}

Upvotes: 21

Mihai Toader
Mihai Toader

Reputation: 12253

as in javascript:

mydata.match(/'([^']+)'/)[1]

the actual regexp is: /'([^']+)'/

if you use the non greedy modifier (as per another post) it's like this:

mydata.match(/'(.*?)'/)[1]

it is cleaner.

Upvotes: 3

Related Questions