How to parse names and percents

Question

Import table

create table import 
(
    articleNumber c(20) primary key, 
    material char(70) 
);

insert into import 
values ('TEST', 'Main fabric: 50% Organic Cotton, 49% Cotton, 1% Elastane' )

The material column contains percents and names, in this format:

%,

The Product table contains those values in columns:

create table toode 
(
    originart char(20) primary key,
    mprots1 Numeric(9,3),
    mprots2 Numeric(9,3),
    mprots3 Numeric(9,3),
    materjal1 char(6),
    materjal2 char(6),
    materjal3 char(6),
    materjal4 char(6),
    materjal5 char(6) 
);

insert into toode (originart)  
values ('TEST')

How to update product table row with values from material string?

Updated row should contain

mprots1 = 50
mprots2 = 49
mprots3 = 1
materjal1 = Organi
materjal2 = Cotton
materria3 = Elasta

I'm using Postgres 16

Zegarek · Accepted Answer

Combining the other two threads, regexp_substr() can handle that:
_{demo at db<>fiddle}

insert into toode
select articleNumber
      ,regexp_substr(material,'(\d+)%',1,1,'',1)::numeric(9,3)
      ,regexp_substr(material,'(\d+)%',1,2,'',1)::numeric(9,3)
      ,regexp_substr(material,'(\d+)%',1,3,'',1)::numeric(9,3)
      ,regexp_substr(material,'% ([^,]+),?',1,1,'',1)
      ,regexp_substr(material,'% ([^,]+),?',1,2,'',1)
      ,regexp_substr(material,'% ([^,]+),?',1,3,'',1)
from import_
returning *;

originart	mprots1	mprots2	mprots3	materjal1	materjal2	materjal3	materjal4	materjal5
TEST	50.000	49.000	1.000	Organic Cotton	Cotton	Elastane	null	null

For percentages, that's at least one digit, followed by a %.

For material names:

You're looking for % followed by the material name
The material name is expected to be anything after that, up until the end or nearest comma. The [^,]+ means one or more character other than the comma, so it tolerates spaces, hyphenated names etc.
,? mean at most one comma, so that you don't need a trailing comma at the end to get a match

The parameters after the pattern are the same as in the post in the other thread.

1 tells it to start at the beginning
1, 2 or 3 means you want the 1st, 2nd or 3rd match
'' is where you'd add flags, like 'i' for case insensitive matching
1 means the first parenthesised sub-expression within the pattern, so it'll only catch the material name, without the leading % and the optional trailing comma.

How to parse names and percents

Answers (2)

Related Questions