Reputation: 5628

Regex pattern for finding string between two characters - but first occurrence of the second character

I want a regex to find string between two characters but only from start delimiter to first occurrence of end delimiter

I want to extract story from the lines of following format

<metadata name="user" story="{some_text_here}" \/>

So I want to extract only : {some_text_here}

For that I am using the following regex:

<metadata name="user" story="(.*)" \/>

And java code:

public static void main(String[] args) throws IOException {
        String regexString = "<metadata name="user" story="(.*)" \/>";
        String filePath = "C:\\Desktop\\temp\\test.txt";
        Pattern p = Pattern.compile(regexString);
        Matcher m;
        try (BufferedReader br = new BufferedReader(new FileReader(filePath))) {
            String line;
            while ((line = br.readLine()) != null) {
                m = p.matcher(line);
                if (m.find()) {                     
                    System.out.println(m.group(1));
                }
            }
        }

    }

This regex mostly works fine but surprisingly if the line is:

<metadata name="user" story="My name is Nick" extraStory="something" />

Running the code also filters My name is Nick" extraStory="something where as I only want to make sure that I get My name is Nick

Also I want to make sure that there is actually no information between story="My name is Nick" and before />

Upvotes: 0

Answers (3)

Aaron

Reputation: 24812

The following XPath should solve your problem :

//metadata[@name='user' and @story and count(@*) = 2]/@story

It address the story attribute of any metadata node in the document whose name attribute is user, which also has a story attribute but no others (attributes count is 2).

(Note : //metadata[@name='user' and count(@*)=2]/@story would be enough since it would be impossible to address the story attribute of a metadata node whose second attribute isn't story)

In Java code, supposing you are handling an instance of org.w3c.dom.Document and already have an instance of XPath available, the code would be the following :

xPath.evaluate("//metadata[@name='user' and @story and count(@*) = 2]/@story", xmlDoc);

You can try the XPath here or the Java code here.

Upvotes: 1

nafas

Reputation: 5423

Just use Jsoup . right tool for the problem :).

its this easy :

String html; //read html file

Document document = Jsoup.parse(html);

String story = document.select("metadata[name=user]").attr("story");

System.out.println(story);

Upvotes: 0

radicarl

Reputation: 327

<metadata name="user" story="([^"]*)" \/>

[^"]* will match everything except the ". In this case the string

<metadata name="user" story="My name is Nick" extraStory="something" />

will not be matched.

Upvotes: 1

Regex pattern for finding string between two characters - but first occurrence of the second character

Answers (3)

Related Questions