summerNight
summerNight

Reputation: 1496

Parse a multiline string in JAVA that includes spaces in between

I am trying to parse the following block of string that has four spaces before and between keys, datatype and values:

    PYTHON_HOME    REG_SZ    C:\Python27;C:\Python27\Scripts
    PYTHON_PATH    REG_SZ    C:\tsde\Python\v34
    SCALA_HOME    REG_SZ    C:\Program Files (x86)\scala
    SZ    REG_SZ    C:\Program Files\7-Zip
    TEMP    REG_EXPAND_SZ    %USERPROFILE%\AppData\Local\Temp
    TMP    REG_EXPAND_SZ    %USERPROFILE%\AppData\Local\Temp

I would like to parse the string and assign parsed values to variables or a multi-dimensional array. The expected result that I am looking for is something like:

Key1 = PYTHON_HOME, Value1 = C:\Python27;C:\Python27\Scripts

Key2 = SCALA_HOME, Value2 = C:\Program Files (x86)\scala

Key3 = SZ, Value3 = C:\Program Files\7-Zip

Key4 =TEMP, Value4 = %USERPROFILE%\AppData\Local\Temp

Key5 = TMP, Value5 = %USERPROFILE%\AppData\Local\Temp

So far I have been playing with pattern and matcher in java.util.regex and haven't actually gotten anywhere.

Please note that the given block of string may have more lines of keys, dataType and values.

Upvotes: 2

Views: 1466

Answers (2)

TheLostMind
TheLostMind

Reputation: 36304

This will also work :

public static void main(String[] args) throws IOException {
        String s = "PYTHON_HOME    REG_SZ    C:\\Python27;C:\\Python27\\Scripts\nPYTHON_PATH    REG_SZ    C:\\tsde\\Python\\v34\nSCALA_HOME    REG_SZ    C:\\Program Files (x86)\\scala\nSZ    REG_SZ    C:\\Program Files\\7-Zip\nTEMP    REG_EXPAND_SZ    %USERPROFILE%\\AppData\\Local\\Temp\nTMP    REG_EXPAND_SZ    %USERPROFILE%\\AppData\\Local\\Temp";
        System.out.println(s);
        Pattern p = Pattern.compile("(?<=\\n|^)(.*?)\\s+(.*?)\\s+(.*?)(?=\\n+|$)",
                Pattern.DOTALL);
        Matcher m = p.matcher(s);
        List<List<String>> list = new ArrayList<List<String>>();
        while (m.find()) {
            List<String> temp = new ArrayList<String>();
            temp.add(m.group(1));
            temp.add(m.group(2));
            temp.add(m.group(3));
            list.add(temp);
        }

        for (List<String> ll : list) {
            System.out.println("1 : " + ll.get(0));
            System.out.println("2 : " + ll.get(1));
            System.out.println("3 : " + ll.get(2));
        }
    }

O/P :

PYTHON_HOME    REG_SZ    C:\Python27;C:\Python27\Scripts
PYTHON_PATH    REG_SZ    C:\tsde\Python\v34
SCALA_HOME    REG_SZ    C:\Program Files (x86)\scala
SZ    REG_SZ    C:\Program Files\7-Zip
TEMP    REG_EXPAND_SZ    %USERPROFILE%\AppData\Local\Temp
TMP    REG_EXPAND_SZ    %USERPROFILE%\AppData\Local\Temp
1 : PYTHON_HOME
2 : REG_SZ
3 : C:\Python27;C:\Python27\Scripts
1 : PYTHON_PATH
2 : REG_SZ
3 : C:\tsde\Python\v34
1 : SCALA_HOME
2 : REG_SZ
3 : C:\Program Files (x86)\scala
1 : SZ
2 : REG_SZ
3 : C:\Program Files\7-Zip
1 : TEMP
2 : REG_EXPAND_SZ
3 : %USERPROFILE%\AppData\Local\Temp
1 : TMP
2 : REG_EXPAND_SZ
3 : %USERPROFILE%\AppData\Local\Temp

Upvotes: 2

vks
vks

Reputation: 67968

^(.*?)[ ]{4}.*?[ ]{4}(.*)$

You can simply use this and grab the captures or groups.

See demo.

https://regex101.com/r/wX9fR1/26

String line = "test_string";
Pattern pattern = Pattern.compile("^(.*?)[ ]{4}.*?[ ]{4}(.*)$",Pattern.MULTILINE);
Matcher matcher = pattern.matcher(line);
while (matcher.find()) {
    System.out.println("group 1: " + matcher.group(1));
    System.out.println("group 2: " + matcher.group(2));
}

Upvotes: 2

Related Questions