Reputation: 5858
Ive encounter this problem.
I need to clean some corrupted data that has the following pattern:
VALUES ( 1', NULL, 'Smith', 'Jackson',...
VALUES ( 2', 'jon@doe', 'John', 'Doe',...
Basically, I need to remove the extra '
, for that I first need to match it...
So I came up with this
(?<=VALUES \( \d+)(.*')(?=(, '[a-zA-Z])||(^NULL,$))
Unfortunately, it's not matching anything, the \d+
is specially problematic.
Any idea how I can isolate the 1'
, 2'
, ... (only the '
after the number that can have N digits) and be able to match the prefix for 'string
or NULL
Upvotes: 2
Views: 851
Reputation: 163207
One option is to use
\b(VALUES\s*\(\s*\d+)'(?=,\s*(?:NULL|'[a-zA-Z]))
Explanation
\b
A word boundary(
Capture group 1 (to be used in the replacement keeping this part)
VALUES\s*\(\s*\d+
Match VALUES, (
and then match 1+ digits)
Close group 1'
Match the '
that you want to remove(?=,\s*(?:NULL|'[a-zA-Z])
Positive lookahead to make sure that directly to the right is a comma followed by NULL or '
and a char a-zIn the replacement use group 1.
Output
VALUES ( 1, NULL, 'Smith', 'Jackson',...
VALUES ( 2, 'jon@doe', 'John', 'Doe',...
Upvotes: 2
Reputation: 520878
Assuming the pattern/trend you showed continues for the entire VALUES
clause, I would recommend the following find and replace, in regex mode:
Find: (VALUES\s+\(\s*\d+)',
Replace: $1,
Note: Another possibility, perhaps simpler than above, might be available if the stray single quote appearing after the numbers also happens to appear in a single column. If so, then you could do a block select of that column across the entire VALUES
clause and just delete it away.
Upvotes: 2