Tcheslav Tcheslav
Tcheslav Tcheslav

Reputation: 190

Remove all occurrence of specific character from strings surrounded by double quotes

I have below string (this is a single string - I've added newlines only for readability):

"someField0";"some value1; some value2; some value3; some value4";
"someField1";"some value1; some value2; some value3";
"someField2";"some value1; some value2";
"someField3";"some value123";

I need to get rid of any occurrence of ; character from quoted parts using regex / substitution. Desired result is like below:

"someField0";"some value1 some value2 some value3 some value4";
"someField1";"some value1 some value2 some value3";
"someField2";"some value1 some value2";
"someField3";"some value123";

Is it possible to achieve this using only regex with substitution?

I have found several solutions like below, but all of them are capable to remove only one occurrence.

("[^",]+);([^"]+")

And then substitute with

$1 $2

I am searching of solution based on pure regex because I need to implement this inside SQL query using regex_replace function. This is a part of more complex parsing statement.

Upvotes: 2

Views: 140

Answers (4)

Rajat
Rajat

Reputation: 5803

You could try using backreferences on capture groups. The idea is to stitch first and third capture group, and leave the second out.

select regexp_replace(str,'([^"])([;]+)([^"])','\\1\\3')

Upvotes: 1

Eternal Dreamer
Eternal Dreamer

Reputation: 483

Like @Bart McEndree suggested. You may temporarily replace ; enclosed by "

Example :

SELECT REPLACE(REPLACE(REPLACE(text, '";"', '"""'), ';', ''), '"""', '";"') FROM <table>;

In this example, I temporarily replace ";" by """, but you may change that.

Upvotes: 1

Cary Swoveland
Cary Swoveland

Reputation: 110755

It's relatively inefficient but you could make use of the fact that a character is within a substring enclosed in double-quotes if the remaining number of double quotes in the string is odd.

You therefore could replace matches of the regular expression

;(?=(?:[^"]*"[^"]*")*[^"]*"[^"]*$)

with an empty string.

Demo.

The problem is that Snowflake regexp functions don't support non-capturing groups, lookaheads and lookbehinds. However, as pointed out by @GregPavli in a comment here, "Snowflake supports JavaScript and it supports all of these. You can write a JavaScript UDF to get non-capturing groups or lookarounds. I've also written a library of JavaScript UDFs that approximate the syntax of the Snowflake regexp functions as closely as possible that support these capabilities here".

Upvotes: 2

dawg
dawg

Reputation: 104092

You essentially have a CSV file. Since Python has a very capable csv parser, try that:

import csv 
from io import StringIO 

txt='''\
"someField0";"some value1; some value2; some value3; some value4";
"someField1";"some value1; some value2; some value3";
"someField2";"some value1; some value2";
"someField3";"some value123";
'''

r=csv.reader(StringIO(txt), delimiter=';')

for row in r:
    print(';'.join([f"\"{s.replace(";","")}\"" for s in row]))

Prints:

"someField0";"some value1 some value2 some value3 some value4";""
"someField1";"some value1 some value2 some value3";""
"someField2";"some value1 some value2";""
"someField3";"some value123";""

Since you have ; at the end of each line, that is why you have the blank field at the end of each line. You can remove the trailing ; in the input or test for a blank field on output if you don't want that.

Upvotes: 0

Related Questions