Reputation: 2139
What I would like to do is get all the text between the quotes. Both of these seem to work properly (using ruby). Can someone please tell me if there is a difference in how both of these work or if they are just the same thing expressed differently.
Edit: I'm mostly looking at getting the text between double quotes for cucumber step definitions (Then I should see "Hello World").
Upvotes: 2
Views: 2787
Reputation: 14770
"(.*)"
- this regex match all the symbols between first and last
quotes of the string because .*
tells to include any symbols in the match."([^"]*)"
- this regex match all the symbols
between first and second quotes of the string because [^"]
tells to not include the quoter in the match.I recommend rubular for regex checking.
Upvotes: 4
Reputation: 495
They are different in that .
will match any character and [^"]
will match any character except quotation marks.
To make them behave more consistent you could change the first example into "(.*?)"
which makes the matching of any character non-greedy (it will capture the shortest string it can, which avoids the risk of finding another end-quotation mark later in the text).
Upvotes: 2
Reputation: 4427
They are not the same. * is greedy, so "(.*)" will match:
fooo "bar" baz "asd"
all the way from the quotation mark before bar and include asd. Which is probably not what you want. Your second example avoids that.
Upvotes: 1
Reputation: 727057
The first one may not get you the same data in case of multiple quoted strings: if the input data is, say
"hello" "world"
the first expression will match the entire string, while the second one will match only the "hello"
portion.
In general, the second expression should be faster, because there is no backtracking. Here is a link to an article discussing this issue at length.
Upvotes: 4
Reputation: 71009
Hmm should not be so. (.*)
will match anything even if it includes quotes, on the other hand ([^"]*)
will math any number of symbols that do not include quotes.
Upvotes: 2