Reputation: 190
I have below string (this is a single string - I've added newlines only for readability):
"someField0";"some value1; some value2; some value3; some value4";
"someField1";"some value1; some value2; some value3";
"someField2";"some value1; some value2";
"someField3";"some value123";
I need to get rid of any occurrence of ; character from quoted parts using regex / substitution. Desired result is like below:
"someField0";"some value1 some value2 some value3 some value4";
"someField1";"some value1 some value2 some value3";
"someField2";"some value1 some value2";
"someField3";"some value123";
Is it possible to achieve this using only regex with substitution?
I have found several solutions like below, but all of them are capable to remove only one occurrence.
("[^",]+);([^"]+")
And then substitute with
$1 $2
I am searching of solution based on pure regex because I need to implement this inside SQL query using regex_replace function. This is a part of more complex parsing statement.
Upvotes: 2
Views: 140
Reputation: 5803
You could try using backreferences on capture groups. The idea is to stitch first and third capture group, and leave the second out.
select regexp_replace(str,'([^"])([;]+)([^"])','\\1\\3')
Upvotes: 1
Reputation: 483
Like @Bart McEndree suggested.
You may temporarily replace ;
enclosed by "
Example :
SELECT REPLACE(REPLACE(REPLACE(text, '";"', '"""'), ';', ''), '"""', '";"') FROM <table>;
In this example, I temporarily replace ";"
by """
, but you may change that.
Upvotes: 1
Reputation: 110755
It's relatively inefficient but you could make use of the fact that a character is within a substring enclosed in double-quotes if the remaining number of double quotes in the string is odd.
You therefore could replace matches of the regular expression
;(?=(?:[^"]*"[^"]*")*[^"]*"[^"]*$)
with an empty string.
Demo.
The problem is that Snowflake regexp functions don't support non-capturing groups, lookaheads and lookbehinds. However, as pointed out by @GregPavli in a comment here, "Snowflake supports JavaScript and it supports all of these. You can write a JavaScript UDF to get non-capturing groups or lookarounds. I've also written a library of JavaScript UDFs that approximate the syntax of the Snowflake regexp functions as closely as possible that support these capabilities here".
Upvotes: 2
Reputation: 104092
You essentially have a CSV file. Since Python has a very capable csv parser, try that:
import csv
from io import StringIO
txt='''\
"someField0";"some value1; some value2; some value3; some value4";
"someField1";"some value1; some value2; some value3";
"someField2";"some value1; some value2";
"someField3";"some value123";
'''
r=csv.reader(StringIO(txt), delimiter=';')
for row in r:
print(';'.join([f"\"{s.replace(";","")}\"" for s in row]))
Prints:
"someField0";"some value1 some value2 some value3 some value4";""
"someField1";"some value1 some value2 some value3";""
"someField2";"some value1 some value2";""
"someField3";"some value123";""
Since you have ;
at the end of each line, that is why you have the blank field at the end of each line. You can remove the trailing ;
in the input or test for a blank field on output if you don't want that.
Upvotes: 0