J Yang
J Yang

Reputation: 1

Postgres: remove second occurrence of a string

I tried to fix bad data in postgres DB where photo tags are appended twice.

The trip is wonderful.<photo=2-1-1601981-7-1.jpg><photo=2-1-1601981-5-2.jpg>We enjoyed it very much.<photo=2-1-1601981-5-2.jpg><photo=2-1-1601981-7-1.jpg>

As you can see in the string, photo tags were added already, but they were appended to the text again. I want to remove the second occurrence: . The first occurrence has certain order and I want to keep them.

I wrote a function that could construct a regex pattern:

CREATE OR REPLACE FUNCTION dd_trip_photo_tags(tagId int) RETURNS text 
LANGUAGE sql IMMUTABLE  
AS $$
SELECT string_agg(concat('<photo=',media_name,'>.*?(<photo=',media_name,'>)'),'|')  FROM   t_ddtrip_media WHERE tag_id=tagId $$;

This captures the second occurrence of a certain photo tag. Then, I use regex_replace to replace the second occurrence: update t_ddtrip_content set content = regexp_replace(content,dd_trip_photo_tags(332761),'') from t_ddtrip_content where tag_id=332761;

Yet, it would remove all matched tags. I looked up online for days but still couldn't figure out a way to fix this. Appreciate any help.

Upvotes: 0

Views: 978

Answers (1)

Eduardo Lynch Araya
Eduardo Lynch Araya

Reputation: 824

This Should Work.

Regex 1:

<photo=.+?>

See: https://regex101.com/r/thHmlq/1

Regex 2:

<.+?>

See: https://regex101.com/r/thHmlq/2

Input:

The trip is wonderful.<photo=2-1-1601981-7-1.jpg><photo=2-1-1601981-5-2.jpg>We enjoyed it very much.<photo=2-1-1601981-5-2.jpg><photo=2-1-1601981-7-1.jpg>

Output:

<photo=2-1-1601981-7-1.jpg>
<photo=2-1-1601981-5-2.jpg>
<photo=2-1-1601981-5-2.jpg>
<photo=2-1-1601981-7-1.jpg>

Upvotes: 1

Related Questions