ps0604
ps0604

Reputation: 1071

Using Java split to get table names in an SQL statement

In the following code I need to get table names as tokens using SQL reserved words as separators. The tokens should contain the table name or the table name followed by a suffix.

For example, given

table1 t1 inner join table2 t2 outer join table3

The code should return three tokens:

Token 1: table1 t1
Token 2: table2 t2 
Token 3: table3

This code instead uses the first reserved word as token, without discarding any other following reserved words:

    String str = "table1 t1 inner join table2 t2 outer join table3";
    String [] tokens = sql2.split("\\son\\s|\\sinner\\s|\\souter\\s|\\sjoin\\s");
    
    for (int i = 0; i<tokens.length; i++)
         System.out.println("Token "+(i+1)+":"+tokens[i]);

This returns:

Token 1:table1 t1
Token 2:join table2 t2 
Token 3:join table3

What is the problem and how to make this work?

Upvotes: 2

Views: 396

Answers (4)

Bohemian
Bohemian

Reputation: 425063

This works for the general case for a series of joined tables:

String[] tableNames = str.split(" (?=inner|outer|left|join|right|cross|full).*?join ");

Upvotes: 2

Arvind Kumar Avinash
Arvind Kumar Avinash

Reputation: 79085

You can use (?:\w+\s+){1,2}(?=inner|outer join)|(?<=inner join)(?:\s+\w+){1,2}|(?<=outer join)(?:\s+\w+){1,2} as the regex.

Demo:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {
    public static void main(String[] args) {
        // Test strings
        String[] arr = { "table1 t1 inner join table2 t2 outer join table3",
                "table1 t1 inner join table2 t2 outer join table3 t3 inner join table2" };
        Pattern pattern = Pattern.compile(
                "(?:\\w+\\s+){1,2}(?=inner|outer join)|(?<=inner join)(?:\\s+\\w+){1,2}|(?<=outer join)(?:\\s+\\w+){1,2}");
        for (String s : arr) {
            System.out.println("Processing: " + s);
            Matcher matcher = pattern.matcher(s);
            while (matcher.find()) {
                System.out.println(matcher.group().trim());
            }
        }
    }
}

Output:

Processing: table1 t1 inner join table2 t2 outer join table3
table1 t1
table2 t2
table3
Processing: table1 t1 inner join table2 t2 outer join table3 t3 inner join table2
table1 t1
table2 t2
table3 t3
table2

Upvotes: 1

x sylver
x sylver

Reputation: 119

I've tested your pattern (unescaped, it's \son\s|\sinner\s|\souter\s|\sjoin\s) against this test string: table1 t1 inner join table2 t2 outer join table3 on regex101.com and the only match I got is for inner and outer. So since you're splitting the string by these tokens, you get your result.

Perhaps this can help you for your specific case. I have went for a regex approach, instead of splitting the data.

public class PatternChecker {

    public static void main(String[] args) {
        String str = "table1 t1 inner join table2 t2 outer join table3";
        Pattern p = Pattern.compile("(table[0-9]+( [a-zA-Z0-9]+ )?)");
        Matcher m = p.matcher(str);

        while(m.find()) {
            System.out.println(m.group(0));
        }
    }

}

Later edit

The split pattern \\son\\s|\\sinner\\s|\\souter\\s|\\sjoin\\s did not work because of the mandatory whitespaces used. For instance, you are searching for *on* or *inner* or *outer* or *join* (whitespaces are marked with an asterisk). The whitespaces are part of the keywords you're splitting with. *join* could not be matched since its left-side whitespace was already picked up by the *outer* and *inner* right-side whitespace matches.

Going back to the split solution, one fix would be to mark the left-side whitespace of join as optional via the ? quantifier; this would be the new pattern: \\son\\s|\\sinner\\s|\\souter\\s|\\s?join\\s. This yields some empty tokens that can be filtered out

Another idea would be to consider aggregations using join (i.e. inner join, outer join) as full search criteria, which would lead to \\son\\s|\\sinner join\\s|\\souter join\\s. No empty tokens are generated.

public class PatternChecker {

    public static void main(String[] args) {
        String str = "employee t1 inner join department t2 outer join job join table4 history on a=b";

        String[] tokens = str.split("\\son\\s|\\sinner join\\s|\\souter join\\s|\\sjoin\\s");

        for(String token : tokens) {
            System.out.println(token);
        }

        // Output
        // employee t1
        // department t2
        // job
        // table4 history
        // a=b

    }

}

Note that, since you're also including on, you can filter out all the matched tokens containing the equals symbol.

For a generic fix, you would need to isolate the string contained between from and where and apply the idea above.


Upvotes: 0

Badr B
Badr B

Reputation: 1413

I found this question that is essentially the same as yours.

You can mostly solve this problem by going through a regex nightmare, or you can try to use an external library like Zql which actually parses the SQL statement for you.

Upvotes: 0

Related Questions