Reputation: 1015
I am working with a diff library in java that outputs diffs with square brackets around them where multiple diffs of the same type exist and no square brackets for diffs where only one diff exists.
An example of the multiple diff is "Diff(4, L3,L4,L5,L6, 119LNS ], [ )" and "Diff(2, R43,R46, 51k ], [ 2, R44,R47, 10k ], [ 2, R45,R48, 1k ], [ )". Examples of the single diffs are "Diff(PBSS306NZ,135)" and "Diff(4, L3,L4,L5,L6, 119LNS ], [ )".
I am looking to extract the diffs from the strings like "4, L3,L4,L5,L6, 119LNS" instead of "Diff(4, L3,L4,L5,L6, 119LNS ], [ )" and I have looked at some of the questions on here that try to do something similar but the regex in those questions dont do what I need. I tried "\[[^\]]\]" and "\[.?\]+" but they dont work. Any help from the regex experts will be appreciated.
I have uploaded a sample output file at https://rapidshare.com/#!download|869l36|460197924|regextest.txt|1
Upvotes: 2
Views: 278
Reputation: 13574
Bernard,
Regarding your additional question in the comment on WhiteFang34's most excellent answer.
http://www.regular-expressions.info/ is THE most regilicious web-resource on the planet. They cover ALL things regex, with correct, accessible explanations of detailed worked examples...
In many cases there coverage is better than the authors original documentation (especially true of Java, sadly). And they cover all languages which support regular expressions, impartially.
Also: Checkout there Tools section: They've got an interactive regex testerpator. USE IT any time you need to develop a non-superficial regex. Think "IDE for Regex's". It's magic (IMHO). And I've just discovered there automatic regex-generator, which seems to even sort-of work.
Anyway, the site is a god-send, just for the clarity of there explanations.
Cheers. Keith.
Upvotes: 0
Reputation: 13574
Bernard,
I guess this might contain a few pointers which I guess might get you going along the right track.
package forums;
public class RegexTest2
{
public static void main(String[] args) {
try {
final String expected = "4, L3,L4,L5,L6, 119LNS";
String actual = "Diff(4, L3,L4,L5,L6, 119LNS ], [ )"
.replaceAll("^Diff\\(( \\], \\[ )?", "")
.replaceAll("[\\[\\], )]*$", "");
assert expected.equals(actual) : actual;
//System.out.println("Correct result: "+actual);
} catch (Exception e) {
e.printStackTrace();
}
}
}
Yup, there's a LOT of guessing going on here... because I don't really know WHAT you want to match... and probably more importantly: everything that you want to NOT match.
Cheers. Keith.
EDIT: Now that I think of it, we're using a bomb where a hammer will do... That is: we're trying to use REGEX (a general purpose pattern matcher) when all we REALLY want is a simple "remove any-and-all-of-these-characters from the start and end of a string. Surely a "custom" method would be a cleaner approach, even if it's a bit more code.
Upvotes: 1
Reputation: 72039
I believe this does what you're looking for:
File file = new File("regextest.txt");
StringBuilder sb = new StringBuilder();
Scanner scanner = new Scanner(file).useDelimiter("\n");
while (scanner.hasNext()) {
String line = scanner.next();
line = line.replaceAll("^Diff\\(", "");
line = line.replaceAll("\\)$", "");
sb.append(line);
}
String combined = sb.toString();
Pattern pattern = Pattern.compile("\\[.+?\\]");
Matcher matcher = pattern.matcher(combined);
while (matcher.find()) {
String extract = combined.substring(matcher.start(), matcher.end());
extract = extract.replaceAll("\\[ ?", "");
extract = extract.replaceAll(" ?\\]", "");
System.out.println(extract);
}
For your regextest.txt
file the output looks like:
12, C1,C4,C5,C6,C9,C10,C15,C18,C19,C20,C23,C24, C0603, 10nF
10, C2,C3,C7,C8,C13,C16,C17,C21,C22,C27, C0603, 100nF
2, C11,C25, SMT, 1uF LOW ESR 50V
4, C12,C14,C26,C28, C0805, 2u2
4, D1,D2,D4,D9, SOT23, BAS40-04/SOT
4, D3,D5,D6,D7, SMB, SMBJ5.0A
1, D8, SMB, SMBJ15A
2, D10,D11, SMB, SMBJ30A
1, J1, SMT, CON12
2, L1,L2, SMT, 744043471, 470uH
4, L3,L4,L5,L6, 119LNS
...
Upvotes: 2