Reputation: 5628
I want a regex to find string between two characters but only from start delimiter to first occurrence of end delimiter
I want to extract story from the lines of following format
<metadata name="user" story="{some_text_here}" \/>
So I want to extract only : {some_text_here}
For that I am using the following regex:
<metadata name="user" story="(.*)" \/>
And java code:
public static void main(String[] args) throws IOException {
String regexString = "<metadata name="user" story="(.*)" \/>";
String filePath = "C:\\Desktop\\temp\\test.txt";
Pattern p = Pattern.compile(regexString);
Matcher m;
try (BufferedReader br = new BufferedReader(new FileReader(filePath))) {
String line;
while ((line = br.readLine()) != null) {
m = p.matcher(line);
if (m.find()) {
System.out.println(m.group(1));
}
}
}
}
This regex mostly works fine but surprisingly if the line is:
<metadata name="user" story="My name is Nick" extraStory="something" />
Running the code also filters My name is Nick" extraStory="something
where as I only want to make sure that I get My name is Nick
Also I want to make sure that there is actually no information between story="My name is Nick"
and before />
Upvotes: 0
Views: 1173
Reputation: 24802
The following XPath should solve your problem :
//metadata[@name='user' and @story and count(@*) = 2]/@story
It address the story
attribute of any metadata
node in the document whose name
attribute is user
, which also has a story
attribute but no others (attributes count is 2).
(Note : //metadata[@name='user' and count(@*)=2]/@story
would be enough since it would be impossible to address the story
attribute of a metadata
node whose second attribute isn't story
)
In Java code, supposing you are handling an instance of org.w3c.dom.Document
and already have an instance of XPath
available, the code would be the following :
xPath.evaluate("//metadata[@name='user' and @story and count(@*) = 2]/@story", xmlDoc);
You can try the XPath here or the Java code here.
Upvotes: 1
Reputation: 5423
Just use Jsoup . right tool for the problem :).
its this easy :
String html; //read html file
Document document = Jsoup.parse(html);
String story = document.select("metadata[name=user]").attr("story");
System.out.println(story);
Upvotes: 0
Reputation: 327
<metadata name="user" story="([^"]*)" \/>
[^"]* will match everything except the ". In this case the string
<metadata name="user" story="My name is Nick" extraStory="something" />
will not be matched.
Upvotes: 1