Reputation: 1071
In the following code I need to get table names as tokens using SQL reserved words as separators. The tokens should contain the table name or the table name followed by a suffix.
For example, given
table1 t1 inner join table2 t2 outer join table3
The code should return three tokens:
Token 1: table1 t1
Token 2: table2 t2
Token 3: table3
This code instead uses the first reserved word as token, without discarding any other following reserved words:
String str = "table1 t1 inner join table2 t2 outer join table3";
String [] tokens = sql2.split("\\son\\s|\\sinner\\s|\\souter\\s|\\sjoin\\s");
for (int i = 0; i<tokens.length; i++)
System.out.println("Token "+(i+1)+":"+tokens[i]);
This returns:
Token 1:table1 t1
Token 2:join table2 t2
Token 3:join table3
What is the problem and how to make this work?
Upvotes: 2
Views: 396
Reputation: 425063
This works for the general case for a series of joined tables:
String[] tableNames = str.split(" (?=inner|outer|left|join|right|cross|full).*?join ");
Upvotes: 2
Reputation: 79085
You can use (?:\w+\s+){1,2}(?=inner|outer join)|(?<=inner join)(?:\s+\w+){1,2}|(?<=outer join)(?:\s+\w+){1,2}
as the regex.
Demo:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
// Test strings
String[] arr = { "table1 t1 inner join table2 t2 outer join table3",
"table1 t1 inner join table2 t2 outer join table3 t3 inner join table2" };
Pattern pattern = Pattern.compile(
"(?:\\w+\\s+){1,2}(?=inner|outer join)|(?<=inner join)(?:\\s+\\w+){1,2}|(?<=outer join)(?:\\s+\\w+){1,2}");
for (String s : arr) {
System.out.println("Processing: " + s);
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
System.out.println(matcher.group().trim());
}
}
}
}
Output:
Processing: table1 t1 inner join table2 t2 outer join table3
table1 t1
table2 t2
table3
Processing: table1 t1 inner join table2 t2 outer join table3 t3 inner join table2
table1 t1
table2 t2
table3 t3
table2
Upvotes: 1
Reputation: 119
I've tested your pattern (unescaped, it's \son\s|\sinner\s|\souter\s|\sjoin\s
) against this test string: table1 t1 inner join table2 t2 outer join table3
on regex101.com and the only match I got is for inner and outer. So since you're splitting the string by these tokens, you get your result.
Perhaps this can help you for your specific case. I have went for a regex approach, instead of splitting the data.
public class PatternChecker {
public static void main(String[] args) {
String str = "table1 t1 inner join table2 t2 outer join table3";
Pattern p = Pattern.compile("(table[0-9]+( [a-zA-Z0-9]+ )?)");
Matcher m = p.matcher(str);
while(m.find()) {
System.out.println(m.group(0));
}
}
}
Later edit
The split pattern \\son\\s|\\sinner\\s|\\souter\\s|\\sjoin\\s
did not work because of the mandatory whitespaces used.
For instance, you are searching for *on*
or *inner*
or *outer*
or *join*
(whitespaces are marked with an asterisk). The whitespaces are part of the keywords you're splitting with. *join*
could not be matched since its left-side whitespace was already picked up by the *outer*
and *inner*
right-side whitespace matches.
Going back to the split solution, one fix would be to mark the left-side whitespace of join as optional via the ?
quantifier; this would be the new pattern: \\son\\s|\\sinner\\s|\\souter\\s|\\s?join\\s
. This yields some empty tokens that can be filtered out
Another idea would be to consider aggregations using join (i.e. inner join, outer join) as full search criteria, which would lead to \\son\\s|\\sinner join\\s|\\souter join\\s
. No empty tokens are generated.
public class PatternChecker {
public static void main(String[] args) {
String str = "employee t1 inner join department t2 outer join job join table4 history on a=b";
String[] tokens = str.split("\\son\\s|\\sinner join\\s|\\souter join\\s|\\sjoin\\s");
for(String token : tokens) {
System.out.println(token);
}
// Output
// employee t1
// department t2
// job
// table4 history
// a=b
}
}
Note that, since you're also including on
, you can filter out all the matched tokens containing the equals symbol.
For a generic fix, you would need to isolate the string contained between from
and where
and apply the idea above.
Upvotes: 0