Reputation: 1496
I am trying to parse the following block of string that has four spaces before and between keys, datatype and values:
PYTHON_HOME REG_SZ C:\Python27;C:\Python27\Scripts
PYTHON_PATH REG_SZ C:\tsde\Python\v34
SCALA_HOME REG_SZ C:\Program Files (x86)\scala
SZ REG_SZ C:\Program Files\7-Zip
TEMP REG_EXPAND_SZ %USERPROFILE%\AppData\Local\Temp
TMP REG_EXPAND_SZ %USERPROFILE%\AppData\Local\Temp
I would like to parse the string and assign parsed values to variables or a multi-dimensional array. The expected result that I am looking for is something like:
Key1 = PYTHON_HOME, Value1 = C:\Python27;C:\Python27\Scripts
Key2 = SCALA_HOME, Value2 = C:\Program Files (x86)\scala
Key3 = SZ, Value3 = C:\Program Files\7-Zip
Key4 =TEMP, Value4 = %USERPROFILE%\AppData\Local\Temp
Key5 = TMP, Value5 = %USERPROFILE%\AppData\Local\Temp
So far I have been playing with pattern and matcher in java.util.regex and haven't actually gotten anywhere.
Please note that the given block of string may have more lines of keys, dataType and values.
Upvotes: 2
Views: 1466
Reputation: 36304
This will also work :
public static void main(String[] args) throws IOException {
String s = "PYTHON_HOME REG_SZ C:\\Python27;C:\\Python27\\Scripts\nPYTHON_PATH REG_SZ C:\\tsde\\Python\\v34\nSCALA_HOME REG_SZ C:\\Program Files (x86)\\scala\nSZ REG_SZ C:\\Program Files\\7-Zip\nTEMP REG_EXPAND_SZ %USERPROFILE%\\AppData\\Local\\Temp\nTMP REG_EXPAND_SZ %USERPROFILE%\\AppData\\Local\\Temp";
System.out.println(s);
Pattern p = Pattern.compile("(?<=\\n|^)(.*?)\\s+(.*?)\\s+(.*?)(?=\\n+|$)",
Pattern.DOTALL);
Matcher m = p.matcher(s);
List<List<String>> list = new ArrayList<List<String>>();
while (m.find()) {
List<String> temp = new ArrayList<String>();
temp.add(m.group(1));
temp.add(m.group(2));
temp.add(m.group(3));
list.add(temp);
}
for (List<String> ll : list) {
System.out.println("1 : " + ll.get(0));
System.out.println("2 : " + ll.get(1));
System.out.println("3 : " + ll.get(2));
}
}
O/P :
PYTHON_HOME REG_SZ C:\Python27;C:\Python27\Scripts
PYTHON_PATH REG_SZ C:\tsde\Python\v34
SCALA_HOME REG_SZ C:\Program Files (x86)\scala
SZ REG_SZ C:\Program Files\7-Zip
TEMP REG_EXPAND_SZ %USERPROFILE%\AppData\Local\Temp
TMP REG_EXPAND_SZ %USERPROFILE%\AppData\Local\Temp
1 : PYTHON_HOME
2 : REG_SZ
3 : C:\Python27;C:\Python27\Scripts
1 : PYTHON_PATH
2 : REG_SZ
3 : C:\tsde\Python\v34
1 : SCALA_HOME
2 : REG_SZ
3 : C:\Program Files (x86)\scala
1 : SZ
2 : REG_SZ
3 : C:\Program Files\7-Zip
1 : TEMP
2 : REG_EXPAND_SZ
3 : %USERPROFILE%\AppData\Local\Temp
1 : TMP
2 : REG_EXPAND_SZ
3 : %USERPROFILE%\AppData\Local\Temp
Upvotes: 2
Reputation: 67968
^(.*?)[ ]{4}.*?[ ]{4}(.*)$
You can simply use this and grab the captures or groups.
See demo.
https://regex101.com/r/wX9fR1/26
String line = "test_string";
Pattern pattern = Pattern.compile("^(.*?)[ ]{4}.*?[ ]{4}(.*)$",Pattern.MULTILINE);
Matcher matcher = pattern.matcher(line);
while (matcher.find()) {
System.out.println("group 1: " + matcher.group(1));
System.out.println("group 2: " + matcher.group(2));
}
Upvotes: 2