Reputation: 5851
str="Tick for symbol .ISEQ-IDX descriptor id 1 timestamp_sec 20130628030105 timestamp_usec 384000;EXCH_TIME 1372388465384;SENDING_TIME 0;PRICE 3957.890000;MIC XDUBIND;"
I dont have any control on changing the format of how this string is created.
I tried this but I cant really get the values of first keys "Tick for symbol","timestamp_sec" etc.
Not only in this specific string but I was curious about how to parse a string with multiple regex splits. Any help will be appreciated.
String[] s = line.split(";");
Map<String, String> m = new HashMap<String, String>();
for (int i = 0; i < s.length; i++)
{
String[] split = s[i].split("\\s+");
for (String string2 : split)
{
//Adding key value pair. to a map for further usage.
m.put(split[0], split[1]);
}
}
Edit
Desired output into a map:
(Tick for Symbol, .ISEQ-IDX)
(descriptor id, 1)
(timestamp_sec,20130628030105)
(timestamp_usec,384000)
(EXCH_TIME,1372388465384)
(SENDING_TIME,0)
(PRICE, 3957.890000)
(MIC, XDUBIND)
Upvotes: 0
Views: 385
Reputation: 566
Using the Pattern
class from java.util.regex package, described step by step in this java Regex tutorial:
private static final Pattern splitPattern = Pattern.compile("^Tick for symbol (.*) descriptor id (\\d+) timestamp_sec (\\d+) timestamp_usec (\\d+);EXCH_TIME (\\d+);SENDING_TIME ?(\\d+);PRICE (.*);MIC (\\w+);$");
private static String printExtracted(final String str) {
final Matcher m = splitPattern.matcher(str);
if (m.matches()) {
final String tickForSymbol = m.group(1);
final long descriptorId = Long.parseLong(m.group(2), 10);
final long timestampSec = Long.parseLong(m.group(3), 10);
final long timestampUsec = Long.parseLong(m.group(4), 10);
final long exchTime = Long.parseLong(m.group(5), 10);
final long sendingTime = Long.parseLong(m.group(6), 10);
final double price = Double.parseDouble(m.group(7));
final String mic = m.group(8);
return "(Tick for Symbol, " + tickForSymbol + ")\n" +
"(descriptor id, " + descriptorId + ")\n" +
"(timestamp_sec, " + timestampSec + ")\n" +
"(timestamp_usec, " + timestampUsec + ")\n" +
"(EXCH_TIME, " + exchTime + ")\n" +
"(SENDING_TIME, " + sendingTime +")\n" +
"(PRICE, " + price + ")\n" +
"(MIC, " + mic + ")";
} else {
throw new IllegalArgumentException("Argument " + str + " doesn't match pattern.");
}
}
Edit: Using group
instead of replaceAll
as it makes more sense and and is also faster.
Upvotes: 1
Reputation: 4213
How about the following? You specify a list of key-value pattern pairs. Keys are specified directly as strings, values as regexes. Then you go thru this list and search the text for the key followed by the value pattern, if you find it you extract the value.
I assume the keys can be in any order, not all have to be present, there might be more than one space separating them. If you know the order of the keys, you can always start find
on the place where the previous find
ended. If you know all keys are obligatory, you can throw an exception if you do not find what you look for.
static String test="Tick for symbol .ISEQ-IDX descriptor id 1 timestamp_sec 20130628030105 timestamp_usec 384000;EXCH_TIME 1372388465384;SENDING_TIME 0;PRICE 3957.890000;MIC XDUBIND;";
static List<String> patterns = Arrays.asList(
"Tick for symbol", "\\S+",
"descriptor id", "\\d+",
"timestamp_sec", "\\d+",
"timestamp_usec", "\\d+",
"EXCH_TIME", "\\d+",
"SENDING_TIME","\\d+",
"PRICE", "\\d+.\\d",
"MIC", "\\S+"
);
public static void main(String[] args) {
Map<String,String> map = new HashMap<>();
for (int i = 0; i<patterns.size();i+=2) {
String key = patterns.get(i);
String val = patterns.get(i+1);
String pattern = "\\Q" +key + "\\E\\s+(" + val + ")";
Matcher m = Pattern.compile(pattern).matcher(test);
if (m.find()) {
map.put(key, m.group(1));
}
}
System.out.println(map);
}
Upvotes: 3
Reputation: 713
I don't think a regex will help you here, whoever designed that output String clearly didn't have splitting in mind.
I suggest simply parsing through the String with a loop and doing the whole thing manually. Alternatively you can just look through the String for substrings (suck as "Tick for symbol"), then take whatever word comes after (until the next space), since the second parameter always seems to be one words.
Upvotes: 1