adis
adis

Reputation: 5951

Java Regex: find word that matches begin and end

im new to Regular Expressions in general and I start to read more about them , so be gentle :-)

I want to find all words that begins with my(" or my('. The word itself can contain underscores, characters, digits, basically any char. But it should end with ") or ').

So I tried the following:

Pattern.compile("_(\"(.*)\")"); // for underscores first, instead of my

and

Pattern.compile("(my)(\"(.*)\")");

But this give me other things back as well, and I can't see why and where I making the thinking mistake...

Thanks

Upvotes: 3

Views: 3837

Answers (3)

Prince John Wesley
Prince John Wesley

Reputation: 63698

Use word boundry option,

\bmy\((["']).*?\1\)(?:\b|$)

Upvotes: 0

Thomas
Thomas

Reputation: 88707

If you want to match my("xxx") and my('xxx') but not my("xxx') then try the following expression:

my\((?:"[^"]*"|'[^']*')\)

Here's a short breakdown of the expression:

  • my\(...\) means the match should start with my( and end with )
  • (?:"[^"]*"|'[^']*') means a sequence of characters surrounded by either double quotes or single quotes (therefore the character class means "any character not being a double quote" or "any character not being a single quote")

Edit:

The problem with the expression (my)("(.*)") is, that it is greedy and the match would start at my(" but end on the last ") due to the .* which matches anything. Thus it would match my("xxx") your("yyy") because .* matches xxx") your("yyy.

For more information on regular expressions see http://www.regular-expressions.info

Upvotes: 2

npinti
npinti

Reputation: 52185

In regular expressions, the brackets (( and )) are actually reserved characters so you will need to escape those. So this regex should do the trick: _\\(\"(.*)\"\\). However, you also stated that you wanted to find words which must begin with my( and must end with "). So you will need to add anchors like so: ^my\\([\"'](.*)[\"']\\)$. This should match any string which starts with my(" or my("' and ends with ") or ').

The ^ and $ are anchors. The ^ will instruct the regex engine to start matching from the beginning of the string and the $ will instruct the regex engine to stop matching at the end of the string. If you remove these anchors, the following would be considered as matches: foo my('...') bar, my("...") bar, etc.

This however will make no distinction and will match also strings like my("...') and my('...").

Upvotes: 0

Related Questions