Alex
Alex

Reputation: 893

Java match a single word, that may or may not be separated by spaces

I want to write a program which reads the following input:

<repeat value="2" content="helloworld"/>

Now I need to parse and store 'repeat', '2' and 'helloword' in different variables. So far so good. The catch is that there may be whitespaces anywhere in the input, which makes the task significantly harder and out of my capabilities. I thought to maybe use regex, but I couldn't get one working and my research on the topic yielded no result. So what would be a clever way to do this?

Example:

< rep eat va lue=" 2"    conte nt= "helloworld"/>

To mach

repeat, 2, helloworld

Upvotes: 0

Views: 95

Answers (2)

divideByZero
divideByZero

Reputation: 1188

I would suggest you to use DOM parser, for example Jsoup. Of course, input should be valid xml/html

package com.example;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;

public class AttributesReader {
    public static void main(String[] args) throws Exception {
        String xmlStrMessage="<repeat value=\"2\" content=\"helloworld\"/>";
        Document doc = Jsoup.parse(xmlStrMessage);
        Elements repeat = doc.select("repeat");
        System.out.println("value:"+repeat.attr("value"));
        System.out.println("content:"+repeat.attr("content"));
    }
}

Upvotes: 0

Byte Commander
Byte Commander

Reputation: 6736

Use this regex to cover all possible spacings:

<\s*(\w+)\s+value\s*=\s*"(\w+)"\s*content\s*=\s*"(\w+)"\s*\/\s*>

This will match the entire string you gave as example and return the tag (1st group), value (2nd group) and content (3rd group).

Test it online at regex101.com


Update:

To even allow spaces inside the keywords value and content, you can simply add a \s* (matches any number of whitespace characters, including zero) between every letter:

<\s*(.+)\s+v\s*a\s*l\s*u\s*e\s*=\s*"(\w+)"\s*c\s*o\s*n\s*t\s*e\s*n\s*t\s*=\s*"(.+)"\s*\/\s*>

Test it online at regex101.com

Upvotes: 1

Related Questions