Reputation: 893
I want to write a program which reads the following input:
<repeat value="2" content="helloworld"/>
Now I need to parse and store 'repeat', '2' and 'helloword' in different variables. So far so good. The catch is that there may be whitespaces anywhere in the input, which makes the task significantly harder and out of my capabilities. I thought to maybe use regex, but I couldn't get one working and my research on the topic yielded no result. So what would be a clever way to do this?
Example:
< rep eat va lue=" 2" conte nt= "helloworld"/>
To mach
repeat, 2, helloworld
Upvotes: 0
Views: 95
Reputation: 1188
I would suggest you to use DOM parser, for example Jsoup. Of course, input should be valid xml/html
package com.example;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;
public class AttributesReader {
public static void main(String[] args) throws Exception {
String xmlStrMessage="<repeat value=\"2\" content=\"helloworld\"/>";
Document doc = Jsoup.parse(xmlStrMessage);
Elements repeat = doc.select("repeat");
System.out.println("value:"+repeat.attr("value"));
System.out.println("content:"+repeat.attr("content"));
}
}
Upvotes: 0
Reputation: 6736
Use this regex to cover all possible spacings:
<\s*(\w+)\s+value\s*=\s*"(\w+)"\s*content\s*=\s*"(\w+)"\s*\/\s*>
This will match the entire string you gave as example and return the tag (1st group), value (2nd group) and content (3rd group).
Test it online at regex101.com
Update:
To even allow spaces inside the keywords value
and content
, you can simply add a \s*
(matches any number of whitespace characters, including zero) between every letter:
<\s*(.+)\s+v\s*a\s*l\s*u\s*e\s*=\s*"(\w+)"\s*c\s*o\s*n\s*t\s*e\s*n\s*t\s*=\s*"(.+)"\s*\/\s*>
Test it online at regex101.com
Upvotes: 1