Reputation: 11
I have an XML file with a specific structure for which I need to select the number between the tags on the second row. In this example it is 4391190.
I have tried playing with \"CURRENT\">(.+?)\</value> but it does not help me any further. Can anybody help me on this?
I know regex is not the best solution, but the tool only accepts regex to select text.
Thanks
<value ref="Meterstand verbruik dagtarief" obis="99.99.99.99.99.FG" unit="30" scaler="0" type="uint" registervaluetype="CUMULATED" registertimetype="CURRENT">3692930</value>
<value ref="Meterstand verbruik nachttarief" obis="99.99.99.99.99.FG.FF" unit="30" scaler="0" type="uint" registervaluetype="CUMULATED" registertimetype="CURRENT">4391190</value>
<value ref="Meterstand injectie dagtarief" obis="99.99.99.99.99.FG" unit="30" scaler="0" type="uint" registervaluetype="CUMULATED" registertimetype="CURRENT">0</value>
<value ref="Meterstand injectie nachttarief" obis="99.99.99.99.99.FG" unit="30" scaler="0" type="uint" registervaluetype="CUMULATED" registertimetype="CURRENT">0</value>
Upvotes: 1
Views: 212
Reputation:
This regex works using java 7 and 8 for the examples you provided:
"^.*\"CURRENT\">([0-9].*)</value>.*$"
Below is a java test program demonstrating using it to extract the number you want from a string and from each line in a file where the file used contained only the four example lines you provided and is named testfile.xml.
I tried "(?s)^.?\"CURRENT\">.+?.?\"CURRENT\">(.+?)" as the regex, but it gave no ouput, while (?s)^.?\"CURRENT\">.+?\</value>.?\"CURRENT\">(.+?)\</value> contains invalid escape sequences for a java regex, namely \< and / and could not be used with java.
For simple data extraction from XML and other file formats, regular expressions can be a good and sometimes the only solution. I had to use this method for analysis, data exctraction and construction of XML configuration files for tomcat, weblogic and activemq since perl had to be used and it was not allowed to install an XML parser for it.
package RegExamples;
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class GroupCapture1 {
public static void main(String[] args) {
String x = "<value ref=\"Meterstand verbruik nachttarief\" obis=\"99.99.99.99.99.FG.FF\" unit=\"30\" scaler=\"0\" type=\"uint\" registervaluetype=\"CUMULATED\" registertimetype=\"CURRENT\">4391190</value>";
String v = captureGroup(x);
System.out.println(v + "\n");
ArrayList<String> a = extractMatchesFromFile("testfile.xml");
for (String s : a) {
System.out.println(s);
}
}
public static String captureGroup(String s) {
Pattern p = Pattern.compile("^.*\"CURRENT\">([0-9].*)</value>.*$");
Matcher m = p.matcher(s);
String v = "";
if (m.matches()) {
v = m.group(1);
}
return v;
}
public static ArrayList<String> extractMatchesFromFile(String fileName) {
File file = new File(fileName);
String v = null;
String line = null;
ArrayList<String> a = new ArrayList<String>();
try {
Scanner input = new Scanner(file);
while (input.hasNextLine()) {
line = input.nextLine().trim();
if (line != null) {
v = captureGroup(line);
if (v != null) {
a.add(v);
}
}
}
input.close();
} catch (FileNotFoundException x) {
System.out.println(x.getMessage());
}
return a;
}
}
Upvotes: 0
Reputation: 174836
Use the below regex and get the string you want from group index 1.
(?s)^.*?\"CURRENT\">.+?\<\/value>.*?\"CURRENT\">(.+?)\<\/value>
Upvotes: 1
Reputation: 185730
With xmllint (change your tool-set) :
$ xmllint --html --xpath '//value[2]/text()' xml 2>/dev/null
4391190
Regex is not the right tool to query a XML document. xpath is quite better for this !
Upvotes: 0