Ruben Steins
Ruben Steins

Reputation: 2820

Why is this seemingly correct Regex not working correctly in Rascal?

In have following code:

set[str] noNnoE = { v | str v <- eu, (/\b[^eEnN]*\b/ := v) };

The goal is to filter out of a set of strings (called 'eu'), those strings that have no 'e' or 'n' in them (both upper- and lowercase). The regular expression I've provided:

/\b[^eEnN]?\b/

seems to work like it should, when I try it out in an online regex-tester.

When trying it out in the Rascel terminal it doesn't seem to work:

 rascal>/\b[^eEnN]*\b/ := "Slander";
 bool: true

I expected no match. What am I missing here? I'm using the latest (stable) Rascal release in Eclipse Oxygen1a.

Upvotes: 1

Views: 349

Answers (2)

Jorgos Korres
Jorgos Korres

Reputation: 71

This solution (/\b[^en]+\b/i) doesn't work for strings consisting of two words, such as the Czech Republic.

Try /\b[^en]+\b$/i. That seems to work for me.

Upvotes: 0

Mark Hills
Mark Hills

Reputation: 1038

Actually, the online regex-tester is giving the same match that we are giving. You can look at the match as follows:

if (/<w1:\b[^eEnN]?\b>/ := "Slander") 
  println("The match is: |<w1>|");

This is assigning the matched string to w1 and then printing it between the vertical bars, assuming the match succeeds (if it doesn't, it returns false, so the body of the if will not execute). If you do this, you will get back a match to the empty string:

The match is: ||

The online regex tester says the same thing:

 Match 1
 Full match 0-0 ''

If you want to prevent this, you can force at least one occurrence of the characters you are looking for by using a +, versus a ?:

rascal>/\b[^eEnN]+\b/ := "Slander";
bool: false

Note that you can also make the regex match case insensitive by following it with an i, like so:

/\b[^en]+\b/i

This may make it easier to write if you need to add more characters into the character class.

Upvotes: 2

Related Questions