Andrus
Andrus

Reputation: 27955

How to parse names and percents

Import table

create table import 
(
    articleNumber c(20) primary key, 
    material char(70) 
);

insert into import 
values ('TEST', 'Main fabric: 50% Organic Cotton, 49% Cotton, 1% Elastane' )

The material column contains percents and names, in this format:

<percent>%<space><name>,

The Product table contains those values in columns:

create table toode 
(
    originart char(20) primary key,
    mprots1 Numeric(9,3),
    mprots2 Numeric(9,3),
    mprots3 Numeric(9,3),
    materjal1 char(6),
    materjal2 char(6),
    materjal3 char(6),
    materjal4 char(6),
    materjal5 char(6) 
);

insert into toode (originart)  
values ('TEST')

How to update product table row with values from material string?

Updated row should contain

mprots1 = 50
mprots2 = 49
mprots3 = 1
materjal1 = Organi
materjal2 = Cotton
materria3 = Elasta

I'm using Postgres 16

Upvotes: 0

Views: 72

Answers (2)

Zegarek
Zegarek

Reputation: 26347

Combining the other two threads, regexp_substr() can handle that:
demo at db<>fiddle

insert into toode
select articleNumber
      ,regexp_substr(material,'(\d+)%',1,1,'',1)::numeric(9,3)
      ,regexp_substr(material,'(\d+)%',1,2,'',1)::numeric(9,3)
      ,regexp_substr(material,'(\d+)%',1,3,'',1)::numeric(9,3)
      ,regexp_substr(material,'% ([^,]+),?',1,1,'',1)
      ,regexp_substr(material,'% ([^,]+),?',1,2,'',1)
      ,regexp_substr(material,'% ([^,]+),?',1,3,'',1)
from import_
returning *;
originart mprots1 mprots2 mprots3 materjal1 materjal2 materjal3 materjal4 materjal5
TEST 50.000 49.000 1.000 Organic Cotton Cotton Elastane null null

For percentages, that's at least one digit, followed by a %.

For material names:

  • You're looking for % followed by the material name
  • The material name is expected to be anything after that, up until the end or nearest comma. The [^,]+ means one or more character other than the comma, so it tolerates spaces, hyphenated names etc.
  • ,? mean at most one comma, so that you don't need a trailing comma at the end to get a match

The parameters after the pattern are the same as in the post in the other thread.

  • 1 tells it to start at the beginning
  • 1, 2 or 3 means you want the 1st, 2nd or 3rd match
  • '' is where you'd add flags, like 'i' for case insensitive matching
  • 1 means the first parenthesised sub-expression within the pattern, so it'll only catch the material name, without the leading % and the optional trailing comma.

Upvotes: 1

Dan Crews
Dan Crews

Reputation: 3637

Assuming you always have integers, it's something like this:

INSERT INTO toode (originart, mprots1, mprots2, mprots3, materjal1, materjal2, materjal3)
SELECT 
    'TEST' AS originart,
    COALESCE((regexp_matches(material, '\b(\d+)%'))[1]::NUMERIC, 0) AS mprots1,
    COALESCE((regexp_matches(material, ', (\d+)%', 'g'))[1]::NUMERIC, 0) AS mprots2,
    COALESCE((regexp_matches(material, ', (\d+)%', 'g'))[2]::NUMERIC, 0) AS mprots3,
    substring((regexp_matches(material, '(\w+\s?\w*): \d+%'))[1] FROM 1 FOR 6) AS materjal1,
    substring((regexp_matches(material, ', (\w+\s?\w*) \d+%', 'g'))[1] FROM 1 FOR 6) AS materjal2,
    substring((regexp_matches(material, ', (\w+\s?\w*) \d+%', 'g'))[2] FROM 1 FOR 6) AS materjal3
FROM source_table
WHERE id = your_row_id; 

Upvotes: 1

Related Questions