PostgreSQL - splitting rows using regex

Question

I have got a table that has got colums ID, name and description. Table looks like:

ID    |  Name | Model
1     |  Ford | Focus 3
2.1-3 |  Opel | 1. Astra 2
      |       | 2. Vectra 2
      |       | 3. Vectra 3
3.1-2 | Toyota| Avensis 2; Micra 
4.1-2 |  Opel | (various versions) 1. Astra
      |       | (various versions) 2. Vectra
5.1-3 | Mazda | MX5; GTR; MX4

And I would like to split it into new rows using "regexp_split_to_table" To get result like this:

ID    |  Name | Description
1     |  Ford | Focus 3
2.1   |  Opel | Astra 2
2.2   |  Opel | Vectra 2
2.3   |  Opel | Vectra 3
3.1   | Toyota| Avensis
3.2   | Toyota| Micra
4.1   |  Opel | Astra
4.2   |  Opel | Vectra
5.1   | Mazda | MX5
5.2   | Mazda | GTR
5.3   | Mazda | MX4

How to do it using postresql and update main table after that?

Big thanks for your help!

Abelisto · Accepted Answer

Generally it is impossible for several reasons:

Your data is not ordered, for your example it is possible to have something like

ID    |  Name | Description
      |       | (year)  2. 2004
1     |  Ford | Some text
5.1-3 | Mazda | Petrol; 1.9; 3-doors
      |       | 2. Diesel
2.1-3 |  Opel | 1. Astra
      |       | 3. 2005
3     | Toyota| 2001; Petrol; 1.8 TDI
4.1-2 |  Opel | (model) 1. Vectra

Just try something like

drop table if exists my_ugly_table;
create table my_ugly_table as select generate_series(1,3) as x, generate_series(1,3) as y;
select * from my_ugly_table;
update my_ugly_table set y = 4 where x = 2;
select * from my_ugly_table;

and you will have

first result:

 x | y 
---+---
 1 | 1
 2 | 2
 3 | 3
(3 rows)

second result:

 x | y 
---+---
 1 | 1
 3 | 3
 2 | 4
(3 rows)

As you can see row order was changed.

Next, your goal is wrong at the point that you want to keep in the ID some valuable data like 1.2, 1.3 and so on. ID must be just a unique identifier of the row and nothing else. Ideally you have no any knowledge about values of IDs - them just exists.

However you can try do do something with your original data using plpgsql or something like:

First of all create table where we will performing something bad with our data:

create table models_t as select * from models m where 1=2;

It will create empty table models_t with structure same to table models.

Finally for them lets create truly PK:

alter table models_t add mt_id serial not null primary key;

Next lets fill it by data:

do language plpgsql $$
declare
  p_rec models;
  c_rec models;
begin
  p_rec := null;
  for c_rec in (select * from models) loop
    p_rec.id := coalesce(c_rec.id, p_rec.id);
    p_rec.name := coalesce(c_rec.name, p_rec.name);
    p_rec.description := c_rec.description;
    insert into models_t values (p_rec.id, p_rec.name, p_rec.description);
    raise notice '% %', c_rec.id, c_rec.name;
  end loop;
end; $$;

Only one result of it is that we have table without gaps like:

postgres=# select * from models_t;
  id   |  name  |     description      | mt_id 
-------+--------+----------------------+-------
 1     | Ford   | Some text            |     1
 2.1-3 | Opel   | 1. Astra             |     2
 2.1-3 | Opel   | 2. Diesel            |     3
 2.1-3 | Opel   | 3. 2005              |     4
 3     | Toyota | 2001; Ptrol; 1.8 TDI |     5
 4.1-2 | Opel   | (model) 1. Vectra    |     6
 4.1-2 | Opel   | (year) 2. 2004       |     7
 5.1-3 | Mazda  | Petrol; 1.9; 3-doors |     8
(8 rows)

Actually it is enough. However, lets parse our last data:

select 
  *, 
  substring(id from '(\d*)\.?.*') as main_id, -- First number before dot
  row_number() over (partition by substring(id from '(\d*)\.?.*')) as secondary_id -- Order inside previous value
from models_t;

Result:

  id   |  name  |     description      | mt_id | main_id | secondary_id 
-------+--------+----------------------+-------+---------+--------------
 1     | Ford   | Some text            |     1 | 1       |            1
 2.1-3 | Opel   | 1. Astra             |     2 | 2       |            1
 2.1-3 | Opel   | 2. Diesel            |     3 | 2       |            2
 2.1-3 | Opel   | 3. 2005              |     4 | 2       |            3
 3     | Toyota | 2001; Ptrol; 1.8 TDI |     5 | 3       |            1
 4.1-2 | Opel   | (model) 1. Vectra    |     6 | 4       |            1
 4.1-2 | Opel   | (year) 2. 2004       |     7 | 4       |            2
 5.1-3 | Mazda  | Petrol; 1.9; 3-doors |     8 | 5       |            1
(8 rows)

At that point we can use columns main_id and secondary_id to build wanted IDs like 1.1 or 2.3.

Everything other is up to you.

Good luck and have fun.

PostgreSQL - splitting rows using regex

Answers (1)

Related Questions