Ahmed Aeon Axan
Ahmed Aeon Axan

Reputation: 2139

Regex "(.*)" vs "([^"]*)"

What I would like to do is get all the text between the quotes. Both of these seem to work properly (using ruby). Can someone please tell me if there is a difference in how both of these work or if they are just the same thing expressed differently.

Edit: I'm mostly looking at getting the text between double quotes for cucumber step definitions (Then I should see "Hello World").

Upvotes: 2

Views: 2787

Answers (5)

freemanoid
freemanoid

Reputation: 14770

  • "(.*)" - this regex match all the symbols between first and last quotes of the string because .* tells to include any symbols in the match.
  • "([^"]*)" - this regex match all the symbols between first and second quotes of the string because [^"] tells to not include the quoter in the match.

I recommend rubular for regex checking.

Upvotes: 4

Håkan Lindqvist
Håkan Lindqvist

Reputation: 495

They are different in that . will match any character and [^"] will match any character except quotation marks.

To make them behave more consistent you could change the first example into "(.*?)" which makes the matching of any character non-greedy (it will capture the shortest string it can, which avoids the risk of finding another end-quotation mark later in the text).

Upvotes: 2

boxed
boxed

Reputation: 4427

They are not the same. * is greedy, so "(.*)" will match:

fooo "bar" baz "asd"

all the way from the quotation mark before bar and include asd. Which is probably not what you want. Your second example avoids that.

Upvotes: 1

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 727057

The first one may not get you the same data in case of multiple quoted strings: if the input data is, say

"hello" "world"

the first expression will match the entire string, while the second one will match only the "hello" portion.

In general, the second expression should be faster, because there is no backtracking. Here is a link to an article discussing this issue at length.

Upvotes: 4

Ivaylo Strandjev
Ivaylo Strandjev

Reputation: 71009

Hmm should not be so. (.*) will match anything even if it includes quotes, on the other hand ([^"]*) will math any number of symbols that do not include quotes.

Upvotes: 2

Related Questions