ldavin
ldavin

Reputation: 423

Ruby regex: exclude apostrophe but include it if it's escaped

I am trying to write a ruby regex in order to extract some data from a long string (HTML source code).

From the below string I want to keep the four numbers (1, 11, 30, 90) and the first single quoted string (blablabla)

AjouterRDV(1, 11, 30, 90, 'blablabla', '123' ... (it goes on) );

My regex currently works with the above example, but fails when the string contains an escaped apostrophe, like in

AjouterRDV(1, 11, 30, 90, 'it\'s failing!', '123' ... (it goes on) );

Here is my regex with two example string (one passing and the other one failing) - Rubular

Upvotes: 1

Views: 437

Answers (3)

garyh
garyh

Reputation: 2852

A simpler way (assumes you don't need to match anything past your captures):

AjouterRDV\((\d+),(\d+),(\d+),(\d+),'(.+?)',

See Rubular example

Upvotes: 3

Rohit Jain
Rohit Jain

Reputation: 213351

You can try this: -

/AjouterRDV\( (\d+), (\d+), (\d+), (\d+), '((?:(?<=\\)[']|[^'])*)', .* \);$/ix

'((?:(?<=\\)[']|[^'])*)' matches ' preceded by \, or matches any character except '

Upvotes: 2

Confusion
Confusion

Reputation: 16861

Hmmm, there was just a comment by someone, but it seems he deleted it. His proposal was

AjouterRDV\( (\d+), (\d+), (\d+), (\d+), '((?<=\\)[']|[^'])*', .* \);$

which almost works, except for the fact that it doesn't capture the 5th group correctly. For that you need:

AjouterRDV\( (\d+), (\d+), (\d+), (\d+), '((?:(?<=\\)[']|[^'])*)', .* \);$

which converts his 'outer' group to a non-capturing group and then captures the selection within the single quotes.

Upvotes: 1

Related Questions