Extracting variables from mathematical equation

I've a string like

a+(b * 6) <= cat*45 && cat = dog

I am trying to extract the variables a, b, cat, dog. Below is my code.

        Set<String> varList = null; 
        StringBuilder sb = null; 
        String expression = "a+(b * 6) <= cat*45 && cat = dog";
        if (expression!=null)
        {
            sb = new StringBuilder(); 

            //list that will contain encountered words,numbers, and white space
            varList = new HashSet<String>();

            Pattern p = Pattern.compile("[A-Za-z\\s]");
            Matcher m = p.matcher(expression);

            //while matches are found 
            while (m.find())
            {
                //add words/variables found in the expression 
                sb.append(m.group());
            }//end while 

            //split the expression based on white space 
            String [] splitExpression = sb.toString().split("\\s");
            for (int i=0; i<splitExpression.length; i++)
            {
                varList.add(splitExpression[i]);
            }
        }

        Iterator iter = varList.iterator();
        while (iter.hasNext()) {
            System.out.println(iter.next());
        }

Output I'm getting is:

ab
cat
dog

Required output:

a
b
cat
dog

Here the case is, the variables may or may not be separated by white space. When there is white space, the output is good. but if the variables are not separated by white space, I'm getting wrong outputs. Can someone suggest me the proper Pattern?

Upvotes: 2

Views: 1586

Answers (5)

Jason Qiao Meng
Jason Qiao Meng

Reputation: 37

I believe you should replace your regexp with "[A-Za-z]+". I just simulated it in Python

>>> re.findall('[A-Za-z]+', 'a+(b * 6) <= cat*45 && cat = dog')
['a', 'b', 'cat', 'cat', 'dog']
>>>

So the next, put the result list into a set:

>>> rs = set(re.findall('[A-Za-z]+', 'a+(b * 6) <= cat*45 && cat = dog'))
>>> for w in rs:
...     print w,
...
a b dog cat
>>>

Upvotes: 1

amith
amith

Reputation: 409

Fully working code

public static void main(String[] args) {
    Set<String> varList = null; 
    StringBuilder sb = null; 
    String expression = "a+(b * 6) <= cat*45 && cat = dog";
    if (expression!=null)
    {
        sb = new StringBuilder(); 

        //list that will contain encountered words,numbers, and white space
        varList = new HashSet<String>();

        Pattern p = Pattern.compile("[A-Za-z\\s]+");
        Matcher m = p.matcher(expression);

        //while matches are found 
        while (m.find())
        {
            //add words/variables found in the expression 
            sb.append(m.group());
            sb.append(",");
        }//end while 

        //split the expression based on white space 
        String [] splitExpression = sb.toString().split(",");
        for (int i=0; i<splitExpression.length; i++)
        {
            if(!splitExpression[i].isEmpty() && !splitExpression[i].equals(" "))
                varList.add(splitExpression[i].trim());
        }
    }

    Iterator iter = varList.iterator();
    while (iter.hasNext()) {
        System.out.println(iter.next());
    }
}

Upvotes: 0

rock321987
rock321987

Reputation: 11042

This regex should work (variable name can start with uppercase or lowercase and can then contain digit(s), underscore, uppercase and lowercase)

\b[A-Za-z]\w*\b

Regex Demo

Java Code

Set<String> set = new HashSet<String>();
String line = "a+(b * 6) <= cat*45 && cat = dog";
String pattern = "\\b([A-Za-z]\\w*)\\b";

Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);

while (m.find()) {
    set.add(m.group());
}
System.out.println(set);

Ideone Demo

Upvotes: 1

user2705585
user2705585

Reputation:

If your variables are simply string of alphabets you can simply search for them using simple regex like this.

Regex: [A-Za-z]+

Regex101 Demo

Upvotes: 1

Andreas
Andreas

Reputation: 159215

Why use a regex find() loop to extract words, then concatenate them all into a string just to split that string again?

Just use the words found by the regex.

Well, that is, after removing whitespace (\\s) from the expression and making it match entire words (+), of course.

Pattern p = Pattern.compile("[A-Za-z]+");
Matcher m = p.matcher(expression);
while (m.find())
{
    varList.add(m.group());
}

Upvotes: 3

Related Questions