Reputation: 21
I want to remove duplicate strings from the Col B
. For example :"New Cap Grp" is repeated five times in second record.
Col A Col B
----- -----
WDSA ALT COMPANY, III & New Group
1101 New Cap Grp & New Cap Grp & New Cap Grp & New Cap Grp & New Cap Grp
2255 Tata Associates Inc. & Tata Associates Inc.& Towers Watson
3355 Picard Lorens, Inc. & Tata Associates Inc. & Tata Associates Inc.
8877 Morphy Companies, Inc. & Morphy Companies, Inc. & Tele Pvt.Ltd
I am new to regular expressions so I am not able to figure out how exactly this can be achieved. If anyone knows how to handle such scenarios then please help me.
Upvotes: 2
Views: 724
Reputation: 1470
I think it's impossible do using only regexp expresion because you must do update for Col B* value.
It's easier do on PL/SQL, I try do it:
create table test
(
id number,
text varchar2(100)
);
insert into test values (1, 'ALT COMPANY, III & New Group');
insert into test values (2, 'New Cap Grp & New Cap Grp & New Cap Grp & New Cap Grp & New Cap Grp');
insert into test values (3, 'Tata Associates Inc. & Tata Associates Inc.& Towers Watson');
insert into test values (4, 'Picard Lorens, Inc. & Tata Associates Inc. & Tata Associates Inc.');
insert into test values (5, 'Morphy Companies, Inc. & Morphy Companies, Inc. & Tele Pvt.Ltd');
declare
l_new_column_value varchar2(1024) := '';
begin
-- go on all row
for x in (select id, text from test)
loop
-- work with each row, do from one row several by separation symbol '&' and take distinct value
for concat_text in (
select distinct trim(regexp_substr(text, '[^&]+', 1, level)) as part_value
from
(
select text
from test
where id = x.id
)
connect by instr(text, '&', 1, level - 1) > 0)
loop
-- formiration new uniq value
l_new_column_value := l_new_column_value || concat_text.part_value || ' & ';
end loop;
-- undate raw data
update test
set text = substr(l_new_column_value, 0, length(l_new_column_value)-3)
where id = x.id;
l_new_column_value := '';
end loop;
end;
Upvotes: 1