Reputation: 133
Using SQL, I am trying to update substring in a column with regexp.
That's my goal :
I have HTML content in my column : example :
<a href="http://blabla.fr/blibli">Link1</a>
or an other content with multiple links:
<p>BlaBlaBla</p>
<a href="http://blabla.fr/blibli">Link1</a>
<a href="http://blabla.fr/">Link2</a>
<a href="http://blabla.fr/test">Link3 </a>
(This example represente an entire string, not many result of select ) - I want to remove 'http://' in urls which not contains 'blibli'
So I did this query
UPDATE my_table SET my_col = replace(my_col,'http://','') where not regexp_like(my_col,'http://^((?blibli).)*$')
But this work only if the column contains at least one link like my first example. If the column contains many link, it will update nothing due to the conditions. (Maybe doing this with many step? ) I tried to have a look to REGEXP_SUBSTR but i think i can't achieve this with it too. The desired result for the second example is :
<p>BlaBlaBla</p>
<a href="http://blabla.fr/blibli">Link1</a>
<a href="blabla.fr/">Link2</a>
<a href="blabla.fr/test">Link3 </a>
Thanks for any advice you can give me
Upvotes: 3
Views: 5496
Reputation: 191245
Similar to Trung's approach, but capturing both (either) matched group:
update my_table set my_col = regexp_replace(my_col,
'href="(http://[^"]*blibli[^"]*)"|href="http://([^"]*)"', 'href="\1\2"')
optionally with a where
clause to avoid updating rows that don't have an href
, say.
Quick demo of how sample values (supplied in a CTE) are translated:
with my_table (id, my_col) as (
select 1, '<a href="http://blabla.fr/blibli">Link1</a>' from dual
union all select 2, '<a href="http://blabla.fr/blibli/abc">Link1</a>' from dual
union all select 3, '<a href="http://blabla.fr/xbliblix">Link1</a>' from dual
union all select 4, '<a href="http://blabla.fr/test">Link1</a>' from dual
union all select 5, '<p>BlaBlaBla</p>
<a href="http://blabla.fr/blibli">Link1</a>
<a href="http://blabla.fr/">Link2</a>
<a href="http://blabla.fr/test">Link3 </a>' from dual
)
select id, my_col, regexp_replace(my_col,
'href="(http://[^"]*blibli[^"]*)"|href="http://([^"]*)"',
'href="\1\2"') as result
from my_table;
ID MY_COL RESULT
---------- ------------------------------------------------ ------------------------------------------------
1 <a href="http://blabla.fr/blibli">Link1</a> <a href="http://blabla.fr/blibli">Link1</a>
2 <a href="http://blabla.fr/blibli/abc">Link1</a> <a href="http://blabla.fr/blibli/abc">Link1</a>
3 <a href="http://blabla.fr/xbliblix">Link1</a> <a href="http://blabla.fr/xbliblix">Link1</a>
4 <a href="http://blabla.fr/test">Link1</a> <a href="blabla.fr/test">Link1</a>
5 <p>BlaBlaBla</p> <p>BlaBlaBla</p>
<a href="http://blabla.fr/blibli">Link1</a> <a href="http://blabla.fr/blibli">Link1</a>
<a href="http://blabla.fr/">Link2</a> <a href="blabla.fr/">Link2</a>
<a href="http://blabla.fr/test">Link3 </a> <a href="blabla.fr/test">Link3 </a>
Upvotes: 2
Reputation: 3475
You could use REGEXP_REPLACE, with grouping
UPDATE my_table SET my_col= REGEXP_REPLACE(my_col, 'href="(http://.*blibli[^"]*)|href="http://([^"])', 'href="\1')
You could check the sample at this link http://rextester.com/HEMG60862
Upvotes: 4
Reputation: 273
You said "I want to remove 'http://' in urls which not contains 'blibli'"
Why do not you use simple regex like that?
UPDATE my_table SET my_col= replace(my_col,'http://','') where not regexp_like(my_col,'*blibli*')
Upvotes: 1