Reputation: 189
I've a string like
a+(b * 6) <= cat*45 && cat = dog
I am trying to extract the variables a, b, cat, dog
. Below is my code.
Set<String> varList = null;
StringBuilder sb = null;
String expression = "a+(b * 6) <= cat*45 && cat = dog";
if (expression!=null)
{
sb = new StringBuilder();
//list that will contain encountered words,numbers, and white space
varList = new HashSet<String>();
Pattern p = Pattern.compile("[A-Za-z\\s]");
Matcher m = p.matcher(expression);
//while matches are found
while (m.find())
{
//add words/variables found in the expression
sb.append(m.group());
}//end while
//split the expression based on white space
String [] splitExpression = sb.toString().split("\\s");
for (int i=0; i<splitExpression.length; i++)
{
varList.add(splitExpression[i]);
}
}
Iterator iter = varList.iterator();
while (iter.hasNext()) {
System.out.println(iter.next());
}
Output I'm getting is:
ab
cat
dog
Required output:
a
b
cat
dog
Here the case is, the variables may or may not be separated by white space. When there is white space, the output is good. but if the variables are not separated by white space, I'm getting wrong outputs. Can someone suggest me the proper Pattern
?
Upvotes: 2
Views: 1586
Reputation: 37
I believe you should replace your regexp with "[A-Za-z]+". I just simulated it in Python
>>> re.findall('[A-Za-z]+', 'a+(b * 6) <= cat*45 && cat = dog')
['a', 'b', 'cat', 'cat', 'dog']
>>>
So the next, put the result list into a set:
>>> rs = set(re.findall('[A-Za-z]+', 'a+(b * 6) <= cat*45 && cat = dog'))
>>> for w in rs:
... print w,
...
a b dog cat
>>>
Upvotes: 1
Reputation: 409
Fully working code
public static void main(String[] args) {
Set<String> varList = null;
StringBuilder sb = null;
String expression = "a+(b * 6) <= cat*45 && cat = dog";
if (expression!=null)
{
sb = new StringBuilder();
//list that will contain encountered words,numbers, and white space
varList = new HashSet<String>();
Pattern p = Pattern.compile("[A-Za-z\\s]+");
Matcher m = p.matcher(expression);
//while matches are found
while (m.find())
{
//add words/variables found in the expression
sb.append(m.group());
sb.append(",");
}//end while
//split the expression based on white space
String [] splitExpression = sb.toString().split(",");
for (int i=0; i<splitExpression.length; i++)
{
if(!splitExpression[i].isEmpty() && !splitExpression[i].equals(" "))
varList.add(splitExpression[i].trim());
}
}
Iterator iter = varList.iterator();
while (iter.hasNext()) {
System.out.println(iter.next());
}
}
Upvotes: 0
Reputation: 11042
This regex should work (variable name can start with uppercase or lowercase and can then contain digit(s), underscore, uppercase and lowercase
)
\b[A-Za-z]\w*\b
Java Code
Set<String> set = new HashSet<String>();
String line = "a+(b * 6) <= cat*45 && cat = dog";
String pattern = "\\b([A-Za-z]\\w*)\\b";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
while (m.find()) {
set.add(m.group());
}
System.out.println(set);
Upvotes: 1
Reputation:
If your variables are simply string of alphabets you can simply search for them using simple regex like this.
Regex: [A-Za-z]+
Upvotes: 1
Reputation: 159215
Why use a regex find()
loop to extract words, then concatenate them all into a string just to split that string again?
Just use the words found by the regex.
Well, that is, after removing whitespace (\\s
) from the expression and making it match entire words (+
), of course.
Pattern p = Pattern.compile("[A-Za-z]+");
Matcher m = p.matcher(expression);
while (m.find())
{
varList.add(m.group());
}
Upvotes: 3