QuickDrawMcgraw
QuickDrawMcgraw

Reputation: 83

REGEXP to insert special characters, not remove

How would i put double quotes around the two fields that are missing it? Would i be able to use like a INSTR/SUBSTR/REPLACE in one statement to accomplish it?

string := '"ES26653","ABCBEVERAGES","861526999728",606.32,"2017-01-26","2017-01-27","","",77910467,"DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","68144"';

Expected string := '"ES26653","ABCBEVERAGES","861526999728","**606.32**","2017-01-26","2017-01-27","","","**77910467**","DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","68144"';

Please suggest! Thank you.

Upvotes: 0

Views: 99

Answers (3)

Gary_W
Gary_W

Reputation: 10360

This method makes 2 passes on the string. First look for a grouping of a double-quote followed by a comma, followed by a character that is not a double-quote. Replace them by referring to them with the shorthand of their group, the first group, '\1', the missing double-quote, the second group '\2'. Then do it again, but the other way around. Sure you could nest the regex_replace calls and end up with one big ugly statement, but just make it 2 statements for easier maintenance. The guy working on this after you will thank you, and this is ugly enough as it is.

SQL> with orig(str) as (
     select '"ES26653","ABCBEVERAGES","861526999728",606.32,"2017-01-26","2017
-01-27","","",77910467,"DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA
","NE","68144"'
     from dual
   ),
   rpl_first(str) as (
     select regexp_replace(str, '(",)([^"])', '\1"\2')
   from orig
   )
   select regexp_replace(str, '([^"])(,")', '\1"\2') fixed_string
   from rpl_first;

FIXED_STRING
--------------------------------------------------------------------------------

"ES26653","ABCBEVERAGES","861526999728","606.32","2017-01-26","2017-01-27","",""

,"77910467","DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","681

44"


SQL>

EDIT: Changed regex's and added a third step to allow for empty, unquoted fields per Unoembre's comment. Good catch! Also added additional test cases. Always expect the unexpected and make sure to add test cases for all data combinations.

SQL> with orig(str) as (
        select '"ES26653","ABCBEVERAGES","861526999728",606.32,"2017-01-26","2
017-01-27","","",77910467,"DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OM
AHA","NE","68144"'
        from dual union
        select 'ES26653,"ABCBEVERAGES","861526999728"' from dual union
        select '"ES26653","ABCBEVERAGES",861526999728' from dual union
        select '1S26653,"ABCBEVERAGES",861526999728' from dual union
        select '"ES26653",,861526999728' from dual
      ),
      rpl_empty(str) as (
        select regexp_replace(str, ',,', ',"",')
        from orig
      ),
      rpl_first(str) as (
        select regexp_replace(str, '(",|^)([^"])', '\1"\2')
      from rpl_empty
      )
      select regexp_replace(str, '([^"])(,"|$)', '\1"\2') fixed_string
      from rpl_first;

FIXED_STRING
--------------------------------------------------------------------------------

"ES26653","ABCBEVERAGES","861526999728","606.32","2017-01-26","2017-01-27","",""

,"77910467","DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","681

44"

"ES26653","ABCBEVERAGES","861526999728"
"ES26653","","861526999728"
"1S26653","ABCBEVERAGES","861526999728"
"ES26653","ABCBEVERAGES","861526999728"

SQL>

Upvotes: 0

Unoembre
Unoembre

Reputation: 555

This offering attempts to address a number of end cases:

  • Addressing issues with first and last fields. Here only the last field is a special case as we look out for the end-of-string $ rather than a comma.
  • Empty unquoted fields i.e. leading commas, consecutive commas and trailing commas.
  • Preserving a pair of double quotes within a field representing a single double quote.

The SQL:

WITH orig(str) AS (
     SELECT '"ES26653","ABCBEVERAGES","861526999728",606.32,"2017-01-26","2017-01-27","","",77910467,"DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","68144"'
     FROM dual
   ),
   rpl_first(str) AS (
     SELECT REGEXP_REPLACE(str, '("(([^"]|"")*)"|([^,]*))(,|$)','"\2\4"\5') 
   FROM orig
   )
   SELECT REGEXP_REPLACE(str, '"""$','"') fixed_string
   FROM rpl_first;

The technique is to find either a quoted field and remember it or a non-quoted field and remember it, terminated by a comma or end-of-string and remember that. The answers is then a " followed by one of the fields followed by " and then the terminator.

The quoted field is basically "[^"]*" where [^"] is a any character that is not a quote and * is repeated zero or more times. This is complicated by the fact the not-a-quote character could also be a pair of quotes so we need an OR construct (|) i.e. "([^"]|"")*". However we must remember just the field inside the quotes so add brackets so we can later back reference just that i.e. "(([^"]|"")*)".

The unquoted field is simply a non-comma repeated zero or more times where we want to remember it all ([^,]*).

So we want to find either of these, the OR construct again i.e. ("(([^"]|"")*)"|([^,]*)). Followed by the terminator, either a comma or end-of-string, which we want to remember i.e. (,|$).

Now we can replace this with one of the two types of field we found enclosed in quotes followed by the terminator i.e. "\2\4"\5. The number n for the back reference \n is just a matter of counting the open brackets.

The second REGEXP_REPLACE is to work around something I suspect is an Oracle bug. If the last field is quoted then a extra pair of quotes is added to the end of the string. This suggests that the end-of-string is being processed twice when it is parsed, which would be a bug. However regexp processing is probably done by a standard library routine so it may be my interpretation of the regexp rules. Comments are welcome.

Oracle regexp documentation can be found at Using Regular Expressions in Database Applications.

My thanks to @Gary_W for his template. Here I am keeping the two separate regexp blocks to separate the bit I can explain from the bit I can't (the bug?).

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269953

This answer does not work in this case, because some fields contain commas. I am leaving it in case it helps anyone else.

One rather brute force method for internal fields is:

replace(replace(string, ',', '","'), '""', '"')

This adds double quotes on either side of a comma and then removes double double quotes. You don't need to worry about "". It becomes """" and then back to "".

This can be adapted for the first and last fields as well, but it complicates the expression.

Upvotes: 1

Related Questions