user3373589
user3373589

Reputation: 13

Extract all cell references in excel with regex in java

have excel data and formulars like:

and so on.

I want to get all the cell references out of the strings. Like in this Example: "E10+E11+SUM(E10;E14:E17)*E18-IF(E19<1,E20, E21)"

I want the output substring like "E10 E11 ... E21" or separated with ",".

I tested a lot with regex but can't get a valid result. I am using this code:

String formulaString = "E10+E11+SUM(E10;E14:E17)*E18-IF(E19<1,E20, E21)";
Pattern pattern = Pattern.compile("REGEX");
Matcher matcher = pattern.matcher(formulaString);

I have tried the following regex:

http://social.msdn.microsoft.com/Forums/en-US/815e819c-f0f2-4a53-8407-98b0f7f116e2/regex-to-extract-list-of-cell-references-from-excel-formula?forum=csharpgeneral

REGEX: (\w+|)?\$?(?:\bXF[A-D]|X[A-E][A-Z]|[A-W][A-Z]{2}|[A-Z]{2}|[A-Z])\$?(?:104857[0-6]|10485[0-6]\d|1048[0-4]\d{2}|104[0-7]\d{3}|10[0-3]\d{4}|[1-9]\d{1,5}|[1-9])d?\b(:\s?\$?(?:\bXF[A-D]|X[A-E][A-Z]|[A-W][A-Z]{2}|[A-Z]{2}|[A-Z])\$?(?:104857[0-6]|10485[0-6]\d|1048[0-4]\d{2}|104[0-7]\d{3}|10[0-3]\d{4}|[1-9]\d{1,5}|[1-9])d?\b)?

http://social.msdn.microsoft.com/Forums/en-US/dc179984-4fc8-4346-90e8-1649a23b6afe/regex-solution-to-id-excel-cell-references-in-an-excel-formula-string?forum=regexp

REGEX: \$?\b([A-Z]|[A-H][A-Z]|I[A-V])\$?([1-9]\d{0,3}|[1-5]\d{4}|6[0-4]\d{3}|65[0-4]\d{2}|655[0-2]\d|6553[0-6])\b([:\s]\$?\b([A-Z]|[A-H][A-Z]|I[A-V])\$?([1-9]\d{0,3}|[1-5]\d{4}|6[0-4]\d{3}|65[0-4]\d{2}|655[0-2]\d|6553[0-6])\b)?

For some of the formular they are working, but not for all.

I hope anyone can help me or give me a tip :)

Upvotes: 0

Views: 1245

Answers (2)

Solace
Solace

Reputation: 2189

public static void main(String[]args){
String formula = "E10+E11+SUM(E10;E14:E17)*E18-IF(E19<1,E20, E21)";
String output="";


for(String c: formula.split("[^A-z0-9]+")){
    if(isCell(c)){
               output+=c+" ";
    }
}


}

private static boolean isCell(String current){
   boolean hasLetter = false;
   boolean hasNumber = false;

for(int i=0; i<current.length() && (!hasLetter || !hasNumber); i++){


    if(current.charAt(i)>=65 && current.charAt(i)<=90){


        hasLetter=true;
    }
    else if(current.charAt(i)>='0' && current.charAt(i)<='9'){

        hasNumber=true;
    }
}

return hasLetter && hasNumber;
}

Upvotes: 0

uyuyuy99
uyuyuy99

Reputation: 353

On the first thread you linked to, the regex you put below the link is nowhere to be found on the page. Were you really using a regex from that page? The regex that was suggested was:

(\w+|)?\$?(?:\bXF[A-D]|X[A-E][A-Z]|[A-W][A-Z]{2}|[A-Z]{2}|[A-Z])\$?(?:104857[0-6]|10485[0-6]\d|1048[0-4]\d{2}|104[0-7]\d{3}|10[0-3]\d{4}|[1-9]\d{1,5}|[1-9])d?\b(:\s?\$?(?:\bXF[A-D]|X[A-E][A-Z]|[A-W][A-Z]{2}|[A-Z]{2}|[A-Z])\$?(?:104857[0-6]|10485[0-6]\d|1048[0-4]\d{2}|104[0-7]\d{3}|10[0-3]\d{4}|[1-9]\d{1,5}|[1-9])d?\b)?

Try that.

Also, it would help to know what specific strings don't get matched correctly, since you mentioned that some of them work.

Upvotes: 0

Related Questions